What algorithms are considered reinforcement learning algorithms?What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?

Improving Sati-Sampajañña (situative wisdom)

What does this quote in Small Gods refer to?

Why did Captain America age?

Does the 500 feet falling cap apply per fall, or per turn?

Two researchers want to work on the same extension to my paper. Who to help?

What is wrong with my code? RGB potentiometer

Has there been evidence of any other gods?

Why should password hash verification be time consistent?

Thesis' "Future Work" section – is it acceptable to omit personal involvement in a mentioned project?

Why do unstable nuclei form?

How to slow yourself down (for playing nice with others)

Why are parallelograms defined as quadrilaterals? What term would encompass polygons with greater than two parallel pairs?

Peculiarities in low dimensions or low order or etc

Examples where existence is harder than evaluation

Was Mohammed the most popular first name for boys born in Berlin in 2018?

Has magnetic core memory been used beyond the Moon?

Is there any evidence to support the claim that the United States was "suckered into WW1" by Zionists, made by Benjamin Freedman in his 1961 speech

is it permitted to swallow spit on a fast day?

Remove color cast in darktable?

Company threw a surprise party for the CEO, 3 weeks later management says we have to pay for it, do I have to?

Can 'sudo apt-get remove [write]' destroy my Ubuntu?

We are two immediate neighbors who forged our own powers to form concatenated relationship. Who are we?

Are there variations of the regular runtimes of the Big-O-Notation?

Why do Thanos' punches not kill Captain America or at least cause vital wounds?



What algorithms are considered reinforcement learning algorithms?


What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


What are the areas that belong to the Reinforcement Learning?



TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










share|improve this question











$endgroup$


















    3












    $begingroup$


    What are the areas that belong to the Reinforcement Learning?



    TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



    Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      What are the areas that belong to the Reinforcement Learning?



      TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



      Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










      share|improve this question











      $endgroup$




      What are the areas that belong to the Reinforcement Learning?



      TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



      Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?







      reinforcement-learning terminology definitions






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 30 at 14:58









      nbro

      3,0011726




      3,0011726










      asked Apr 30 at 14:26









      Miguel SaraivaMiguel Saraiva

      1557




      1557




















          2 Answers
          2






          active

          oldest

          votes


















          4












          $begingroup$

          The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



          However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



          On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



          If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



          There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






          share|improve this answer











          $endgroup$












          • $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59






          • 1




            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19











          • $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54


















          1












          $begingroup$

          In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




          Reinforcement learning, like many topics whose names end with “ing,” such as machine
          learning and mountaineering, is simultaneously a problem, a class of solution methods
          that work well on the problem, and the field that studies this problem and its solution
          methods. It is convenient to use a single name for all three things, but at the same time
          essential to keep the three conceptually separate. In particular, the distinction between
          problems and solution methods is very important in reinforcement learning; failing to
          make this distinction is the source of many confusions.




          And:




          Markov decision processes are intended to include just
          these three aspects—sensation, action, and goal—in their simplest possible forms without
          trivializing any of them. Any method that is well suited to solving such problems we
          consider to be a reinforcement learning method.




          So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



          Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



          Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4












            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$












            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54















            4












            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$












            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54













            4












            4








            4





            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$



            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 30 at 15:25

























            answered Apr 30 at 14:54









            nbronbro

            3,0011726




            3,0011726











            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54
















            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54















            $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59




            $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59




            1




            1




            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19





            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19













            $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54




            $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54













            1












            $begingroup$

            In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




            Reinforcement learning, like many topics whose names end with “ing,” such as machine
            learning and mountaineering, is simultaneously a problem, a class of solution methods
            that work well on the problem, and the field that studies this problem and its solution
            methods. It is convenient to use a single name for all three things, but at the same time
            essential to keep the three conceptually separate. In particular, the distinction between
            problems and solution methods is very important in reinforcement learning; failing to
            make this distinction is the source of many confusions.




            And:




            Markov decision processes are intended to include just
            these three aspects—sensation, action, and goal—in their simplest possible forms without
            trivializing any of them. Any method that is well suited to solving such problems we
            consider to be a reinforcement learning method.




            So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



            Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



            Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






            share|improve this answer









            $endgroup$

















              1












              $begingroup$

              In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




              Reinforcement learning, like many topics whose names end with “ing,” such as machine
              learning and mountaineering, is simultaneously a problem, a class of solution methods
              that work well on the problem, and the field that studies this problem and its solution
              methods. It is convenient to use a single name for all three things, but at the same time
              essential to keep the three conceptually separate. In particular, the distinction between
              problems and solution methods is very important in reinforcement learning; failing to
              make this distinction is the source of many confusions.




              And:




              Markov decision processes are intended to include just
              these three aspects—sensation, action, and goal—in their simplest possible forms without
              trivializing any of them. Any method that is well suited to solving such problems we
              consider to be a reinforcement learning method.




              So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



              Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



              Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






              share|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$

                In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




                Reinforcement learning, like many topics whose names end with “ing,” such as machine
                learning and mountaineering, is simultaneously a problem, a class of solution methods
                that work well on the problem, and the field that studies this problem and its solution
                methods. It is convenient to use a single name for all three things, but at the same time
                essential to keep the three conceptually separate. In particular, the distinction between
                problems and solution methods is very important in reinforcement learning; failing to
                make this distinction is the source of many confusions.




                And:




                Markov decision processes are intended to include just
                these three aspects—sensation, action, and goal—in their simplest possible forms without
                trivializing any of them. Any method that is well suited to solving such problems we
                consider to be a reinforcement learning method.




                So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



                Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



                Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






                share|improve this answer









                $endgroup$



                In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




                Reinforcement learning, like many topics whose names end with “ing,” such as machine
                learning and mountaineering, is simultaneously a problem, a class of solution methods
                that work well on the problem, and the field that studies this problem and its solution
                methods. It is convenient to use a single name for all three things, but at the same time
                essential to keep the three conceptually separate. In particular, the distinction between
                problems and solution methods is very important in reinforcement learning; failing to
                make this distinction is the source of many confusions.




                And:




                Markov decision processes are intended to include just
                these three aspects—sensation, action, and goal—in their simplest possible forms without
                trivializing any of them. Any method that is well suited to solving such problems we
                consider to be a reinforcement learning method.




                So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



                Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



                Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 30 at 16:06









                Neil SlaterNeil Slater

                6,8431620




                6,8431620



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                    Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                    What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company