Why is random forest an improvement of decision tree?Random forest implementation with probability of choosing column or guarantee of choosing set of columnsselecting variable randomly at each node in a tree in Random Forest“Random Forest” variant of other classifiersWhy do we pick random features in random forestRandom Forest - Explanation Parameter“help” decision tree by tying 2 features togetherwhy do we need row sampling in random forests?Why can decision trees have a high amount of varianceSelection of Features and Data in Random ForestHow does the meta Random Forest Classifier determine the final classification?

Noob at soldering, can anyone explain why my circuit won't work?

Remove everything except csv file Bash Script

How do I get past a 3-year ban from overstay with VWP?

Is "now" UTC time in Solidity?

Can you book a one-way ticket to the UK on a visa?

Can the sorting of a list be verified without comparing neighbors?

Two researchers want to work on the same extension to my paper. Who to help?

Pre-1993 comic in which Wolverine's claws were turned to rubber?

Is it a bad idea to replace pull-up resistors with hard pull-ups?

How could we transfer large amounts of energy sourced in space to Earth?

semanage not changing file context

Does Lawful Interception of 4G / the proposed 5G provide a back door for hackers as well?

How can this pool heater gas line be disconnected?

Is calcium chloride an acidic or basic salt?

Are there variations of the regular runtimes of the Big-O-Notation?

"Fīliolō me auctum scito, salva Terentia"; what is "me" role in this phrase?

Is there a faster way to calculate Abs[z]^2 numerically?

On what legal basis did the UK remove the 'European Union' from its passport?

LocalDate.plus Incorrect Answer

The lexical root of the perfect tense forms differs from the lexical root of the infinitive form

Why do Thanos's punches not kill Captain America or at least cause some mortal injuries?

The lexical root of the past tense forms differs from the lexical root of the infinitive form

Help decide course of action for rotting windows

Was this a power play by Daenerys?

Why is random forest an improvement of decision tree?

Random forest implementation with probability of choosing column or guarantee of choosing set of columnsselecting variable randomly at each node in a tree in Random Forest“Random Forest” variant of other classifiersWhy do we pick random features in random forestRandom Forest - Explanation Parameter“help” decision tree by tying 2 features togetherwhy do we need row sampling in random forests?Why can decision trees have a high amount of varianceSelection of Features and Data in Random ForestHow does the meta Random Forest Classifier determine the final classification?

Let's assume that we have a binary classification problem, and we built a decision tree on our data set.

Assuming that we have 5 features, then the decision tree, in the first step, will choose the best feature of the 5, and on this feature it will choose the best threshold in order to split the data set, and then continue to make the tree deeper etc. The definition of best is the lowest classification error.

My question is: Since the decision tree, on each step, chooses the best feature to split on, and the best threshold to split on, then why random forest (which is many decision trees), is an improvement of decision trees? Shouldn't a decision tree be sufficient?

UPDATE

I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters, when possible, (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier?

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

asked May 1 at 10:54

quant

21319

add a comment |

Let's assume that we have a binary classification problem, and we built a decision tree on our data set.

UPDATE

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

asked May 1 at 10:54

quant

21319

add a comment |

Let's assume that we have a binary classification problem, and we built a decision tree on our data set.

UPDATE

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

asked May 1 at 10:54

quant

21319

Let's assume that we have a binary classification problem, and we built a decision tree on our data set.

UPDATE

random-forest decision-trees

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

asked May 1 at 10:54

quant

21319

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

asked May 1 at 10:54

quant

21319

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

edited May 1 at 13:56

Juan Esteban de la Calle

1,131124

asked May 1 at 10:54

quant

21319

asked May 1 at 10:54

quant

21319

asked May 1 at 10:54

quant

21319

add a comment |

2 Answers
2

active

oldest

votes

It comes down to overfitting as you scale. Decision trees tend to overfit as they grow deep. After every split there will be fewer and fewer samples for the next split to work with. Fewer samples means that risk of splitting on noise increases.

Random forest avoids the overfitting problem of decision trees by instead scaling by adding more trees instead of building one big tree. Averaging the outputs of the trees in the forest means that it does not matter as much if the individual trees are overfitting.

Regarding your update. No, they will not score the same. Random forest will not have just one decision tree. It has several and divides the features into random subsets for each tree to be trained on. So even if the size of the decision trees in random forest would be the same as a single decision tree, the features they are trained on would not be.

But if you ask what happens if we take a random forest, only use one tree and train it on the same features as a single decision tree of the same size, then yes they would be one and the same.

edited May 1 at 12:12

answered May 1 at 11:20

Simon Larsson

1,390218

$begingroup$
so in the training set, a single decision tree and a random forest model would have the same accuracy ?
$endgroup$
– quant
May 1 at 11:21

$begingroup$
Not necessarily. A decision tree will often perform better on the training set but worse on the test set than random forest. Think of it like this, in a decision tree you can just keep splitting until everything in the training set is correctly classified.
$endgroup$
– Simon Larsson
May 1 at 11:25

$begingroup$
I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier ?
$endgroup$
– quant
May 1 at 11:29

$begingroup$
No, random forest has a bit more to it. It will make several trees, each would get the same size as your decision tree if you set those parameters to the same. But the individual trees in the random forest will be trained on random subsets of the features. So they are not really comparable in that way.
$endgroup$
– Simon Larsson
May 1 at 11:33

add a comment |

This is an interesting question to answer as there are multiple reasons why random forests work better than a decision tree. I'll compare how each of classifier/regressor work in each of the below cases

So, We have a dataset with 5 features as you said. Let's consider our decision tree classifier is overfitted to that data. Since the model is overfitted, Any small change in data will cause a huge change in classification (Variance problem). But in RF, Since we are using multiple decision trees in a random forest, Any small change in data will not cause dramatic changes in classification as we take a majority vote of all the trees to take a decision. Hence reducing the overfitting (variance) problem.

If you notice, We do not feed in the entire dataset at once in a random forest. We perform row sampling with replacement column sampling without replacement at every data feeding step and so your model will be able to generalize much better than a decision tree.

Random forests are made up of Decision trees with large depth which has a lot of variance at the start and has reduced variance at the end of learning while. But decision trees you hyperparameter tune them, you don't fix their depths(i.e you don't say whether they are shallow or deep).

Hope this helps!

answered May 1 at 11:59

karthikeyan mg

330111

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f51215%2fwhy-is-random-forest-an-improvement-of-decision-tree%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

But if you ask what happens if we take a random forest, only use one tree and train it on the same features as a single decision tree of the same size, then yes they would be one and the same.

edited May 1 at 12:12

answered May 1 at 11:20

Simon Larsson

1,390218

$begingroup$
so in the training set, a single decision tree and a random forest model would have the same accuracy ?
$endgroup$
– quant
May 1 at 11:21

$begingroup$
Not necessarily. A decision tree will often perform better on the training set but worse on the test set than random forest. Think of it like this, in a decision tree you can just keep splitting until everything in the training set is correctly classified.
$endgroup$
– Simon Larsson
May 1 at 11:25

$begingroup$
I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier ?
$endgroup$
– quant
May 1 at 11:29

$begingroup$
No, random forest has a bit more to it. It will make several trees, each would get the same size as your decision tree if you set those parameters to the same. But the individual trees in the random forest will be trained on random subsets of the features. So they are not really comparable in that way.
$endgroup$
– Simon Larsson
May 1 at 11:33

add a comment |

But if you ask what happens if we take a random forest, only use one tree and train it on the same features as a single decision tree of the same size, then yes they would be one and the same.

edited May 1 at 12:12

answered May 1 at 11:20

Simon Larsson

1,390218

$begingroup$
so in the training set, a single decision tree and a random forest model would have the same accuracy ?
$endgroup$
– quant
May 1 at 11:21

$begingroup$
Not necessarily. A decision tree will often perform better on the training set but worse on the test set than random forest. Think of it like this, in a decision tree you can just keep splitting until everything in the training set is correctly classified.
$endgroup$
– Simon Larsson
May 1 at 11:25

$begingroup$
I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier ?
$endgroup$
– quant
May 1 at 11:29

$begingroup$
No, random forest has a bit more to it. It will make several trees, each would get the same size as your decision tree if you set those parameters to the same. But the individual trees in the random forest will be trained on random subsets of the features. So they are not really comparable in that way.
$endgroup$
– Simon Larsson
May 1 at 11:33

add a comment |

But if you ask what happens if we take a random forest, only use one tree and train it on the same features as a single decision tree of the same size, then yes they would be one and the same.

edited May 1 at 12:12

answered May 1 at 11:20

Simon Larsson

1,390218

But if you ask what happens if we take a random forest, only use one tree and train it on the same features as a single decision tree of the same size, then yes they would be one and the same.

edited May 1 at 12:12

answered May 1 at 11:20

Simon Larsson

1,390218

edited May 1 at 12:12

answered May 1 at 11:20

Simon Larsson

1,390218

answered May 1 at 11:20

Simon Larsson

1,390218

answered May 1 at 11:20

Simon Larsson

1,390218

$begingroup$
so in the training set, a single decision tree and a random forest model would have the same accuracy ?
$endgroup$
– quant
May 1 at 11:21

$begingroup$
Not necessarily. A decision tree will often perform better on the training set but worse on the test set than random forest. Think of it like this, in a decision tree you can just keep splitting until everything in the training set is correctly classified.
$endgroup$
– Simon Larsson
May 1 at 11:25

$begingroup$
I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier ?
$endgroup$
– quant
May 1 at 11:29

$begingroup$
No, random forest has a bit more to it. It will make several trees, each would get the same size as your decision tree if you set those parameters to the same. But the individual trees in the random forest will be trained on random subsets of the features. So they are not really comparable in that way.
$endgroup$
– Simon Larsson
May 1 at 11:33

add a comment |

$begingroup$
so in the training set, a single decision tree and a random forest model would have the same accuracy ?
$endgroup$
– quant
May 1 at 11:21

$begingroup$
Not necessarily. A decision tree will often perform better on the training set but worse on the test set than random forest. Think of it like this, in a decision tree you can just keep splitting until everything in the training set is correctly classified.
$endgroup$
– Simon Larsson
May 1 at 11:25

$begingroup$
I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier ?
$endgroup$
– quant
May 1 at 11:29

$begingroup$
No, random forest has a bit more to it. It will make several trees, each would get the same size as your decision tree if you set those parameters to the same. But the individual trees in the random forest will be trained on random subsets of the features. So they are not really comparable in that way.
$endgroup$
– Simon Larsson
May 1 at 11:33

so in the training set, a single decision tree and a random forest model would have the same accuracy ?

– quant
May 1 at 11:21

Not necessarily. A decision tree will often perform better on the training set but worse on the test set than random forest. Think of it like this, in a decision tree you can just keep splitting until everything in the training set is correctly classified.

– Simon Larsson
May 1 at 11:25

I more mean it like: If you have a decision tree classifier, and a random forest classifier with the same parameters (max_depth, number of children etc), will the decision tree classifier score the same on the training set, with the random forest classifier ?

– quant
May 1 at 11:29

No, random forest has a bit more to it. It will make several trees, each would get the same size as your decision tree if you set those parameters to the same. But the individual trees in the random forest will be trained on random subsets of the features. So they are not really comparable in that way.

– Simon Larsson
May 1 at 11:33

add a comment |

So, We have a dataset with 5 features as you said. Let's consider our decision tree classifier is overfitted to that data. Since the model is overfitted, Any small change in data will cause a huge change in classification (Variance problem). But in RF, Since we are using multiple decision trees in a random forest, Any small change in data will not cause dramatic changes in classification as we take a majority vote of all the trees to take a decision. Hence reducing the overfitting (variance) problem.

If you notice, We do not feed in the entire dataset at once in a random forest. We perform row sampling with replacement column sampling without replacement at every data feeding step and so your model will be able to generalize much better than a decision tree.

Random forests are made up of Decision trees with large depth which has a lot of variance at the start and has reduced variance at the end of learning while. But decision trees you hyperparameter tune them, you don't fix their depths(i.e you don't say whether they are shallow or deep).

Hope this helps!

answered May 1 at 11:59

karthikeyan mg

330111

add a comment |

So, We have a dataset with 5 features as you said. Let's consider our decision tree classifier is overfitted to that data. Since the model is overfitted, Any small change in data will cause a huge change in classification (Variance problem). But in RF, Since we are using multiple decision trees in a random forest, Any small change in data will not cause dramatic changes in classification as we take a majority vote of all the trees to take a decision. Hence reducing the overfitting (variance) problem.

If you notice, We do not feed in the entire dataset at once in a random forest. We perform row sampling with replacement column sampling without replacement at every data feeding step and so your model will be able to generalize much better than a decision tree.

Random forests are made up of Decision trees with large depth which has a lot of variance at the start and has reduced variance at the end of learning while. But decision trees you hyperparameter tune them, you don't fix their depths(i.e you don't say whether they are shallow or deep).

Hope this helps!

answered May 1 at 11:59

karthikeyan mg

330111

add a comment |

So, We have a dataset with 5 features as you said. Let's consider our decision tree classifier is overfitted to that data. Since the model is overfitted, Any small change in data will cause a huge change in classification (Variance problem). But in RF, Since we are using multiple decision trees in a random forest, Any small change in data will not cause dramatic changes in classification as we take a majority vote of all the trees to take a decision. Hence reducing the overfitting (variance) problem.

If you notice, We do not feed in the entire dataset at once in a random forest. We perform row sampling with replacement column sampling without replacement at every data feeding step and so your model will be able to generalize much better than a decision tree.

Random forests are made up of Decision trees with large depth which has a lot of variance at the start and has reduced variance at the end of learning while. But decision trees you hyperparameter tune them, you don't fix their depths(i.e you don't say whether they are shallow or deep).

Hope this helps!

answered May 1 at 11:59

karthikeyan mg

330111

So, We have a dataset with 5 features as you said. Let's consider our decision tree classifier is overfitted to that data. Since the model is overfitted, Any small change in data will cause a huge change in classification (Variance problem). But in RF, Since we are using multiple decision trees in a random forest, Any small change in data will not cause dramatic changes in classification as we take a majority vote of all the trees to take a decision. Hence reducing the overfitting (variance) problem.

If you notice, We do not feed in the entire dataset at once in a random forest. We perform row sampling with replacement column sampling without replacement at every data feeding step and so your model will be able to generalize much better than a decision tree.

Random forests are made up of Decision trees with large depth which has a lot of variance at the start and has reduced variance at the end of learning while. But decision trees you hyperparameter tune them, you don't fix their depths(i.e you don't say whether they are shallow or deep).

Hope this helps!

answered May 1 at 11:59

karthikeyan mg

330111

answered May 1 at 11:59

karthikeyan mg

330111

answered May 1 at 11:59

karthikeyan mg

330111

answered May 1 at 11:59

karthikeyan mg

330111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Otdfbt

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

2 Answers
2

2 Answers
2

2 Answers
2