What algorithms are considered reinforcement learning algorithms?What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?
Improving Sati-Sampajañña (situative wisdom)
What does this quote in Small Gods refer to?
Why did Captain America age?
Does the 500 feet falling cap apply per fall, or per turn?
Two researchers want to work on the same extension to my paper. Who to help?
What is wrong with my code? RGB potentiometer
Has there been evidence of any other gods?
Why should password hash verification be time consistent?
Thesis' "Future Work" section – is it acceptable to omit personal involvement in a mentioned project?
Why do unstable nuclei form?
How to slow yourself down (for playing nice with others)
Why are parallelograms defined as quadrilaterals? What term would encompass polygons with greater than two parallel pairs?
Peculiarities in low dimensions or low order or etc
Examples where existence is harder than evaluation
Was Mohammed the most popular first name for boys born in Berlin in 2018?
Has magnetic core memory been used beyond the Moon?
Is there any evidence to support the claim that the United States was "suckered into WW1" by Zionists, made by Benjamin Freedman in his 1961 speech
is it permitted to swallow spit on a fast day?
Remove color cast in darktable?
Company threw a surprise party for the CEO, 3 weeks later management says we have to pay for it, do I have to?
Can 'sudo apt-get remove [write]' destroy my Ubuntu?
We are two immediate neighbors who forged our own powers to form concatenated relationship. Who are we?
Are there variations of the regular runtimes of the Big-O-Notation?
Why do Thanos' punches not kill Captain America or at least cause vital wounds?
What algorithms are considered reinforcement learning algorithms?
What's a good resource for getting familiar with reinforcement learning?Getting to understand continuous state/action spaces MDPs and Reinforcement LearningTerminology of deep learning: “continuous” or “dynamic”?Feeding a Q-learning algorithms a greater fraction of terminal statesWhat is the difference between on and off-policy deterministic actor-critic?Reinforcement Learning Batch SizeWhat does “stationary” mean in the context of reinforcement learning?What are the value functions used in reinforcement learning?What is the difference between reinforcement learning and optimal control?Is reinforcement learning using shallow neural networks still deep reinforcement learning?Are there reinforcement learning algorithms that scale to large problems?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
$endgroup$
add a comment |
$begingroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
$endgroup$
add a comment |
$begingroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
$endgroup$
What are the areas that belong to the Reinforcement Learning?
TD(0), Q-Learning and SARSA are all temporal-difference algorithms, which belong to the reinforcement learning area, but is there more to it?
Is dynamic programming policy iteration and value iteration considered as part of reinforcement learning? Or are these just basis for the Temporal Difference algorithms which are the only RL algorithms?
reinforcement-learning terminology definitions
reinforcement-learning terminology definitions
edited Apr 30 at 14:58
nbro
3,0011726
3,0011726
asked Apr 30 at 14:26
Miguel SaraivaMiguel Saraiva
1557
1557
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
$endgroup$
The dynamic programming algorithms (like policy iteration and value iteration) are often presented in the context of reinforcement learning (in particular, in the book Reinforcement Learning: An Introduction by Barto and Sutton) because they are very related to reinforcement learning algorithms, like $Q$-learning. They are all based on the assumption that the environment can be modelled as an MDP.
However, dynamic programming algorithms require that the underlying MDP (that is, the associated transition and reward functions) is known. Hence, they are often referred to as "planning" algorithms, because they can be used to find a policy (which can be thought of as "plan") given the "dynamics" of the environment (which is represented by the MDP). They just exploit the given "physical rules" of the environment, in order to find a policy. This "exploitation" is referred to as a "planning algorithm".
On the other hand, $Q$-learning and similar algorithms do not require that the MDP is known. They attempt to find a policy (or value function) by interacting with the environment. They eventually infer the "dynamics" of the underlying MDP from experience (that is, the interaction with the environment).
If the MDP is not given, the problem is often referred to as (full) "reinforcement learning problem". So, algorithms like $Q$-learning or SARSA are often considered reinforcement learning algorithms. The dynamic programming algorithms (like policy iteration) do not solve the "full RL problem", hence they are not always considered RL algorithms, but just planning algorithms.
There are several categories of RL algorithms. There are temporal-difference, Monte-Carlo, actor-critic, model-free, model-based, on-policy, off-policy, prediction, control, policy-based or value-based algorithms. These categories can overlap. For example, $Q$-learning is a temporal-difference (TD), model-free, off-policy, control and value-based algorithm: it is based on an "temporal-difference" update rule (TD), it doesn't use a model of the environment (model-free), it uses a behavioural policy that is different than the policy it learns (off-policy), it is used to find a policy (control) and it attempts to approximate a value function rather than directly the policy (value-based).
edited Apr 30 at 15:25
answered Apr 30 at 14:54
nbronbro
3,0011726
3,0011726
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
$begingroup$
Can you further explain your last sentence ? are you talking about variations like TD[lambda] or policy gradient methods? or something else? Also do you have any recommendation of where I can further read about RL, knowing I already explored the Barto and Sutton book?
$endgroup$
– Miguel Saraiva
Apr 30 at 14:59
1
1
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
@MiguelSaraiva I have updated my answer. I would recommend that you read that book again and more carefully (and that you start implementing some of those algorithms to get full understanding of those concepts). This is a decent book. However, RL comprises a lot of concepts and dense terminology, which often confuse beginners. Have also a look at this question: ai.stackexchange.com/q/6997/2444, in particular, my answer: ai.stackexchange.com/a/7005/2444.
$endgroup$
– nbro
Apr 30 at 15:19
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
$begingroup$
I have seen some of the videos in that series by Silver already and they are indeed good. Thank you for your help.
$endgroup$
– Miguel Saraiva
Apr 30 at 15:54
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
add a comment |
$begingroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
$endgroup$
In Reinforcement Learning: An Introduction the authors suggest that the topic of reinforcement learning covers analysis and solutions to problems that can be framed in this way:
Reinforcement learning, like many topics whose names end with “ing,” such as machine
learning and mountaineering, is simultaneously a problem, a class of solution methods
that work well on the problem, and the field that studies this problem and its solution
methods. It is convenient to use a single name for all three things, but at the same time
essential to keep the three conceptually separate. In particular, the distinction between
problems and solution methods is very important in reinforcement learning; failing to
make this distinction is the source of many confusions.
And:
Markov decision processes are intended to include just
these three aspects—sensation, action, and goal—in their simplest possible forms without
trivializing any of them. Any method that is well suited to solving such problems we
consider to be a reinforcement learning method.
So, to answer your questions, the simplest take on this is yes there is more (much more) to RL than the classic value-based optimal control methods of SARSA and Q-learning.
Including DP and other "RL-related" algorithms in the book allows the author to show how closely related the concepts are. For example, there is little in practice that differentiates Dyna-Q (a planning algorithm closely related to Q-learning) from experience replay. Calling one strictly "planning" and the other "reinforcement learning" and treating them as separate can reduce insight into the topic. In many cases there are hybrid methods or even a continuum between what you may initially think of as RL and "not RL" approaches. Understanding this gives you a toolkit to modify and invent algorithms.
Having said that, the book is not the sole arbiter of what is and isn't reinforcement learning. Ultimately this is just a classification issue, and it only matters if you are communicating with someone and there is a chance for misunderstanding. If you name which algorithm you are using, it doesn't really matter whether the person you are talking to thinks it is RL or not RL. It matters what the problem is and how you propose to solve it.
answered Apr 30 at 16:06
Neil SlaterNeil Slater
6,8431620
6,8431620
add a comment |
add a comment |
Thanks for contributing an answer to Artificial Intelligence Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f12065%2fwhat-algorithms-are-considered-reinforcement-learning-algorithms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown