Is random forest for regression a 'true' regression?Decision Trees and Regression - Can predicted values be outside range of training data?MCMC sampling of decision tree space vs. random forestUsing LASSO on random forestHow is the best split point determined/predictor calculated in a regression random forest?Is getting several times the same variable in a branch of a regression tree the sign of overfitting?How to extract a splitting point for numerical values in a random forest model?Random Forest for Regression: How does a decision tree decides the value of a terminal node when outcome is many continues values?Splitting criteria based on MSE in H2O DRF (Random Forest) and GBMRandom Forest probabilityScale response variable y in random forest or gradient boosted trees for regression == scale prediction?Aggregation of “tree results” in random forest regression

How many chess players are over 2500 Elo?

How do you say “buy” in the sense of “believe”?

What is the difference between nullifying your vote and not going to vote at all?

Infinite Sequence based on Simple Rule

Should I disclose a colleague's illness (that I should not know about) when others badmouth him

How did early x86 BIOS programmers manage to program full blown TUIs given very few bytes of ROM/EPROM?

Does this degree 12 genus 1 curve have only one point over infinitely many finite fields?

What does the view outside my ship traveling at light speed look like?

Is there a down side to setting the sampling time of a SAR ADC as long as possible?

Is there a public standard for 8 and 10 character grid locators?

General purpose replacement for enum with FlagsAttribute

Why is this Simple Puzzle impossible to solve?

Looking for a soft substance that doesn't dissolve underwater

What is the largest (size) solid object ever dropped from an airplane to impact the ground in freefall?

What are these arcade games in Ghostbusters 1984?

Is this resistor leaking? If so, is it a concern?

Were pens caps holes designed to prevent death by suffocation if swallowed?

Is the first derivative operation on a signal a causal system?

I unknowingly submitted plagiarised work

Would the Geas spell work in a dead magic zone once you enter it?

Crossing US border with music files I'm legally allowed to possess

When do characters level up?

Is floating in space similar to falling under gravity?

Why do airplanes use an axial flow jet engine instead of a more compact centrifugal jet engine?

Is random forest for regression a 'true' regression?

Decision Trees and Regression - Can predicted values be outside range of training data?MCMC sampling of decision tree space vs. random forestUsing LASSO on random forestHow is the best split point determined/predictor calculated in a regression random forest?Is getting several times the same variable in a branch of a regression tree the sign of overfitting?How to extract a splitting point for numerical values in a random forest model?Random Forest for Regression: How does a decision tree decides the value of a terminal node when outcome is many continues values?Splitting criteria based on MSE in H2O DRF (Random Forest) and GBMRandom Forest probabilityScale response variable y in random forest or gradient boosted trees for regression == scale prediction?Aggregation of “tree results” in random forest regression

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

Random forests are used for regression. However, from what I understand, they assign an average target value at each leaf. Since there are only limited leaves in each tree, there are only specific values that the target can attain from our regression model. Thus is it not just a 'discrete' regression (like a step function) and not like linear regression which is 'continuous'?

Am I understanding this correctly? If yes, what advantage does random forest offer in regression?

edited May 15 at 4:16

asked May 14 at 12:07

user110565

915

1

$begingroup$
Related: Decision Trees and Regression - Can predicted values be outside range of training data?
$endgroup$
– Stephan Kolassa
May 14 at 12:23

add a comment |

Am I understanding this correctly? If yes, what advantage does random forest offer in regression?

edited May 15 at 4:16

asked May 14 at 12:07

user110565

915

1

$begingroup$
Related: Decision Trees and Regression - Can predicted values be outside range of training data?
$endgroup$
– Stephan Kolassa
May 14 at 12:23

add a comment |

Am I understanding this correctly? If yes, what advantage does random forest offer in regression?

edited May 15 at 4:16

asked May 14 at 12:07

user110565

915

Am I understanding this correctly? If yes, what advantage does random forest offer in regression?

regression random-forest cart

edited May 15 at 4:16

asked May 14 at 12:07

user110565

915

edited May 15 at 4:16

asked May 14 at 12:07

user110565

915

edited May 15 at 4:16

asked May 14 at 12:07

user110565

915

asked May 14 at 12:07

user110565

915

asked May 14 at 12:07

user110565

915

1

$begingroup$
Related: Decision Trees and Regression - Can predicted values be outside range of training data?
$endgroup$
– Stephan Kolassa
May 14 at 12:23

add a comment |

1

$begingroup$
Related: Decision Trees and Regression - Can predicted values be outside range of training data?
$endgroup$
– Stephan Kolassa
May 14 at 12:23

Related: Decision Trees and Regression - Can predicted values be outside range of training data?

– Stephan Kolassa
May 14 at 12:23

add a comment |

2 Answers
2

active

oldest

votes

This is correct - random forests discretize continuous variables since they are based on decision trees, which function through recursive binary partitioning. But with sufficient data and sufficient splits, a step function with many small steps can approximate a smooth function. So this need not be a problem. If you really want to capture a smooth response by a single predictor, you calculate the partial effect of any particular variable and fit a smooth function to it (this does not affect the model itself, which will retain this stepwise character).

Random forests offer quite a few advantages over standard regression techniques for some applications. To mention just three:

They allow the use of arbitrarily many predictors (more predictors than data points is possible)

They can approximate complex nonlinear shapes without a priori specification

They can capture complex interactions between predictions without a priori specification.

As for whether it is a 'true' regression, this is somewhat semantic. After all, piecewise regression is regression too, but is also not smooth.

edited May 14 at 12:28

answered May 14 at 12:23

mkt

3,89352066

7

$begingroup$
Also, regression with only categorical features also wouldn't be smooth.
$endgroup$
– Tim♦
May 14 at 12:41

3

$begingroup$
Could a regression with even one categorical feature be smooth?
$endgroup$
– Dave
May 14 at 19:59

add a comment |

It is discrete, but then any output in the form of a floating point number with fixed number of bits will be discrete. If a tree has 100 leaves, then it can give 100 different numbers. If you have 100 different trees with 100 leaves each, then your random forest can theoretically have 100^100 different values, which can give 200 (decimal) digits of precision, or ~600 bits. Of course, there is going to be some overlap, so you're not actually going to see 100^100 different values. The distribution tends to get more discrete the more you get to the extremes; each tree is going to have some minimum leaf (a leaf that gives an output that's less than or equal to all the other leaves), and once you get the minimum leaf from each tree, you can't get any lower. So there's going to be some minimum overall value for the forest, and as you deviate from that value, you're going to start out with all but a few trees being at their minimum leaf, making small deviations from the minimum value increase in discrete jumps. But decreased reliability at the extremes is a property of regressions in general, not just random forests.

answered May 14 at 16:37

Acccumulation

1,76327

$begingroup$
The leaves can store any value from the training data (so with the right training data, 100 trees of 100 leaves can store up to 10,000 distinct values). But the returned value is the mean of the chosen leaf from each tree. So the number of bits of precision of that value is the same whether you have 2 trees or 100 trees.
$endgroup$
– Darren Cook
May 17 at 7:01

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f408282%2fis-random-forest-for-regression-a-true-regression%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Random forests offer quite a few advantages over standard regression techniques for some applications. To mention just three:

They allow the use of arbitrarily many predictors (more predictors than data points is possible)

They can approximate complex nonlinear shapes without a priori specification

They can capture complex interactions between predictions without a priori specification.

As for whether it is a 'true' regression, this is somewhat semantic. After all, piecewise regression is regression too, but is also not smooth.

edited May 14 at 12:28

answered May 14 at 12:23

mkt

3,89352066

7

$begingroup$
Also, regression with only categorical features also wouldn't be smooth.
$endgroup$
– Tim♦
May 14 at 12:41

3

$begingroup$
Could a regression with even one categorical feature be smooth?
$endgroup$
– Dave
May 14 at 19:59

add a comment |

Random forests offer quite a few advantages over standard regression techniques for some applications. To mention just three:

They allow the use of arbitrarily many predictors (more predictors than data points is possible)

They can approximate complex nonlinear shapes without a priori specification

They can capture complex interactions between predictions without a priori specification.

As for whether it is a 'true' regression, this is somewhat semantic. After all, piecewise regression is regression too, but is also not smooth.

edited May 14 at 12:28

answered May 14 at 12:23

mkt

3,89352066

7

$begingroup$
Also, regression with only categorical features also wouldn't be smooth.
$endgroup$
– Tim♦
May 14 at 12:41

3

$begingroup$
Could a regression with even one categorical feature be smooth?
$endgroup$
– Dave
May 14 at 19:59

add a comment |

Random forests offer quite a few advantages over standard regression techniques for some applications. To mention just three:

They allow the use of arbitrarily many predictors (more predictors than data points is possible)

They can approximate complex nonlinear shapes without a priori specification

They can capture complex interactions between predictions without a priori specification.

As for whether it is a 'true' regression, this is somewhat semantic. After all, piecewise regression is regression too, but is also not smooth.

edited May 14 at 12:28

answered May 14 at 12:23

mkt

3,89352066

Random forests offer quite a few advantages over standard regression techniques for some applications. To mention just three:

They allow the use of arbitrarily many predictors (more predictors than data points is possible)

They can approximate complex nonlinear shapes without a priori specification

They can capture complex interactions between predictions without a priori specification.

As for whether it is a 'true' regression, this is somewhat semantic. After all, piecewise regression is regression too, but is also not smooth.

edited May 14 at 12:28

answered May 14 at 12:23

mkt

3,89352066

edited May 14 at 12:28

answered May 14 at 12:23

mkt

3,89352066

answered May 14 at 12:23

mkt

3,89352066

answered May 14 at 12:23

mkt

3,89352066

7

$begingroup$
Also, regression with only categorical features also wouldn't be smooth.
$endgroup$
– Tim♦
May 14 at 12:41

3

$begingroup$
Could a regression with even one categorical feature be smooth?
$endgroup$
– Dave
May 14 at 19:59

add a comment |

7

$begingroup$
Also, regression with only categorical features also wouldn't be smooth.
$endgroup$
– Tim♦
May 14 at 12:41

3

$begingroup$
Could a regression with even one categorical feature be smooth?
$endgroup$
– Dave
May 14 at 19:59

Also, regression with only categorical features also wouldn't be smooth.

– Tim♦
May 14 at 12:41

Could a regression with even one categorical feature be smooth?

– Dave
May 14 at 19:59

add a comment |

answered May 14 at 16:37

Acccumulation

1,76327

$begingroup$
The leaves can store any value from the training data (so with the right training data, 100 trees of 100 leaves can store up to 10,000 distinct values). But the returned value is the mean of the chosen leaf from each tree. So the number of bits of precision of that value is the same whether you have 2 trees or 100 trees.
$endgroup$
– Darren Cook
May 17 at 7:01

add a comment |

answered May 14 at 16:37

Acccumulation

1,76327

$begingroup$
The leaves can store any value from the training data (so with the right training data, 100 trees of 100 leaves can store up to 10,000 distinct values). But the returned value is the mean of the chosen leaf from each tree. So the number of bits of precision of that value is the same whether you have 2 trees or 100 trees.
$endgroup$
– Darren Cook
May 17 at 7:01

add a comment |

answered May 14 at 16:37

Acccumulation

1,76327

answered May 14 at 16:37

Acccumulation

1,76327

answered May 14 at 16:37

Acccumulation

1,76327

answered May 14 at 16:37

Acccumulation

1,76327

answered May 14 at 16:37

Acccumulation

1,76327

$begingroup$
The leaves can store any value from the training data (so with the right training data, 100 trees of 100 leaves can store up to 10,000 distinct values). But the returned value is the mean of the chosen leaf from each tree. So the number of bits of precision of that value is the same whether you have 2 trees or 100 trees.
$endgroup$
– Darren Cook
May 17 at 7:01

add a comment |

$begingroup$
The leaves can store any value from the training data (so with the right training data, 100 trees of 100 leaves can store up to 10,000 distinct values). But the returned value is the mean of the chosen leaf from each tree. So the number of bits of precision of that value is the same whether you have 2 trees or 100 trees.
$endgroup$
– Darren Cook
May 17 at 7:01

The leaves can store any value from the training data (so with the right training data, 100 trees of 100 leaves can store up to 10,000 distinct values). But the returned value is the mean of the chosen leaf from each tree. So the number of bits of precision of that value is the same whether you have 2 trees or 100 trees.

– Darren Cook
May 17 at 7:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

N97GP,SRuX3yDgIToHdLB,uCkaO082nOSVQU36KjmbayG7h v8AlgLA Xr4v4PTmQCEb xFI7XZ8SG Jk0mVdq,cw

搜尋此網誌

Otdfbt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

2 Answers
2

2 Answers
2

2 Answers
2

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070