What algorithms are considered reinforcement learning algorithms?What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?

Improving Sati-Sampajañña (situative wisdom)

What does this quote in Small Gods refer to?

Why did Captain America age?

Does the 500 feet falling cap apply per fall, or per turn?

Two researchers want to work on the same extension to my paper. Who to help?

What is wrong with my code? RGB potentiometer

Has there been evidence of any other gods?

Why should password hash verification be time consistent?

Thesis' "Future Work" section – is it acceptable to omit personal involvement in a mentioned project?

Why do unstable nuclei form?

How to slow yourself down (for playing nice with others)

Why are parallelograms defined as quadrilaterals? What term would encompass polygons with greater than two parallel pairs?

Peculiarities in low dimensions or low order or etc

Examples where existence is harder than evaluation

Was Mohammed the most popular first name for boys born in Berlin in 2018?

Has magnetic core memory been used beyond the Moon?

Is there any evidence to support the claim that the United States was "suckered into WW1" by Zionists, made by Benjamin Freedman in his 1961 speech

is it permitted to swallow spit on a fast day?

Remove color cast in darktable?

Company threw a surprise party for the CEO, 3 weeks later management says we have to pay for it, do I have to?

Can 'sudo apt-get remove [write]' destroy my Ubuntu?

We are two immediate neighbors who forged our own powers to form concatenated relationship. Who are we?

Are there variations of the regular runtimes of the Big-O-Notation?

Why do Thanos' punches not kill Captain America or at least cause vital wounds?



What algorithms are considered reinforcement learning algorithms?


What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


What are the areas that belong to the Reinforcement Learning?



TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










share|improve this question











$endgroup$


















    3












    $begingroup$


    What are the areas that belong to the Reinforcement Learning?



    TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



    Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      What are the areas that belong to the Reinforcement Learning?



      TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



      Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?










      share|improve this question











      $endgroup$




      What are the areas that belong to the Reinforcement Learning?



      TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?



      Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?







      reinforcement-learning terminology definitions






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 30 at 14:58









      nbro

      3,0011726




      3,0011726










      asked Apr 30 at 14:26









      Miguel SaraivaMiguel Saraiva

      1557




      1557




















          2 Answers
          2






          active

          oldest

          votes


















          4












          $begingroup$

          The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



          However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



          On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



          If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



          There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






          share|improve this answer











          $endgroup$












          • $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59






          • 1




            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19











          • $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54


















          1












          $begingroup$

          In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




          Reinforcement learning, like many topics whose names end with “ing,” such as machine
          learning and mountaineering, is simultaneously a problem, a class of solution methods
          that work well on the problem, and the field that studies this problem and its solution
          methods. It is convenient to use a single name for all three things, but at the same time
          essential to keep the three conceptually separate. In particular, the distinction between
          problems and solution methods is very important in reinforcement learning; failing to
          make this distinction is the source of many confusions.




          And:




          Markov decision processes are intended to include just
          these three aspects—sensation, action, and goal—in their simplest possible forms without
          trivializing any of them. Any method that is well suited to solving such problems we
          consider to be a reinforcement learning method.




          So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



          Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



          Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            4












            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$












            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54















            4












            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$












            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54













            4












            4








            4





            $begingroup$

            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).






            share|improve this answer











            $endgroup$



            The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.



            However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".



            On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).



            If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.



            There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 30 at 15:25

























            answered Apr 30 at 14:54









            nbronbro

            3,0011726




            3,0011726











            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54
















            • $begingroup$
              Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 14:59






            • 1




              $begingroup$
              @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
              $endgroup$
              – nbro
              Apr 30 at 15:19











            • $begingroup$
              I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
              $endgroup$
              – Miguel Saraiva
              Apr 30 at 15:54















            $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59




            $begingroup$
            Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 14:59




            1




            1




            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19





            $begingroup$
            @MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
            $endgroup$
            – nbro
            Apr 30 at 15:19













            $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54




            $begingroup$
            I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
            $endgroup$
            – Miguel Saraiva
            Apr 30 at 15:54













            1












            $begingroup$

            In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




            Reinforcement learning, like many topics whose names end with “ing,” such as machine
            learning and mountaineering, is simultaneously a problem, a class of solution methods
            that work well on the problem, and the field that studies this problem and its solution
            methods. It is convenient to use a single name for all three things, but at the same time
            essential to keep the three conceptually separate. In particular, the distinction between
            problems and solution methods is very important in reinforcement learning; failing to
            make this distinction is the source of many confusions.




            And:




            Markov decision processes are intended to include just
            these three aspects—sensation, action, and goal—in their simplest possible forms without
            trivializing any of them. Any method that is well suited to solving such problems we
            consider to be a reinforcement learning method.




            So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



            Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



            Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






            share|improve this answer









            $endgroup$

















              1












              $begingroup$

              In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




              Reinforcement learning, like many topics whose names end with “ing,” such as machine
              learning and mountaineering, is simultaneously a problem, a class of solution methods
              that work well on the problem, and the field that studies this problem and its solution
              methods. It is convenient to use a single name for all three things, but at the same time
              essential to keep the three conceptually separate. In particular, the distinction between
              problems and solution methods is very important in reinforcement learning; failing to
              make this distinction is the source of many confusions.




              And:




              Markov decision processes are intended to include just
              these three aspects—sensation, action, and goal—in their simplest possible forms without
              trivializing any of them. Any method that is well suited to solving such problems we
              consider to be a reinforcement learning method.




              So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



              Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



              Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






              share|improve this answer









              $endgroup$















                1












                1








                1





                $begingroup$

                In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




                Reinforcement learning, like many topics whose names end with “ing,” such as machine
                learning and mountaineering, is simultaneously a problem, a class of solution methods
                that work well on the problem, and the field that studies this problem and its solution
                methods. It is convenient to use a single name for all three things, but at the same time
                essential to keep the three conceptually separate. In particular, the distinction between
                problems and solution methods is very important in reinforcement learning; failing to
                make this distinction is the source of many confusions.




                And:




                Markov decision processes are intended to include just
                these three aspects—sensation, action, and goal—in their simplest possible forms without
                trivializing any of them. Any method that is well suited to solving such problems we
                consider to be a reinforcement learning method.




                So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



                Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



                Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.






                share|improve this answer









                $endgroup$



                In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:




                Reinforcement learning, like many topics whose names end with “ing,” such as machine
                learning and mountaineering, is simultaneously a problem, a class of solution methods
                that work well on the problem, and the field that studies this problem and its solution
                methods. It is convenient to use a single name for all three things, but at the same time
                essential to keep the three conceptually separate. In particular, the distinction between
                problems and solution methods is very important in reinforcement learning; failing to
                make this distinction is the source of many confusions.




                And:




                Markov decision processes are intended to include just
                these three aspects—sensation, action, and goal—in their simplest possible forms without
                trivializing any of them. Any method that is well suited to solving such problems we
                consider to be a reinforcement learning method.




                So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.



                Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.



                Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 30 at 16:06









                Neil SlaterNeil Slater

                6,8431620




                6,8431620



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                    Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

                    Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020