Random Forest different results for same observationResponse-distribution-dependent bias in random forest regressionRandom forest - binary classification vs. regression?random forest classification in R - no separation in training setLow explained variance in Random Forest (R randomForest)randomForest vs randomForestSRC discrepanciesStrange Behavior of Random Forest in Binary ClassificationRandom Forest vs. General Additive Model for Predictingmeasure for prediction error in random forest regressionRandom Forest Regression with sparse data in Python

Why didn't this character get a funeral at the end of Avengers: Endgame?

The origin of list data structure

Should homeowners insurance cover the cost of the home?

Motion-trail-like lines

Piano: quaver triplets in RH v dotted quaver and semiquaver in LH

Is 'contemporary' ambiguous and if so is there a better word?

Enabling a minor mode in all but some buffers

Why are oscilloscope input impedances so low?

Sheared off exhasut pipe: How to fix without a welder?

Where are the "shires" in the UK?

Constitutional limitation of criminalizing behavior in US law?

In Futurama, how many beings has Leela slept with?

What to do when scriptures go against conscience?

Some Russian letters overlap the next line of text when used in drop caps

It isn’t that you must stop now

How can I get people to remember my character's gender?

Where did Lovecraft write about Carcosa?

Would a small hole in a Faraday cage drastically reduce its effectiveness at blocking interference?

Why did WWI include Japan?

Can my 2 children, aged 10 and 12, who are US citizens, travel to the USA on expired American passports?

Hostile Divisor Numbers

Has the Hulk always been able to talk?

no sense/need/point

Why would one crossvalidate the random state number?



Random Forest different results for same observation


Response-distribution-dependent bias in random forest regressionRandom forest - binary classification vs. regression?random forest classification in R - no separation in training setLow explained variance in Random Forest (R randomForest)randomForest vs randomForestSRC discrepanciesStrange Behavior of Random Forest in Binary ClassificationRandom Forest vs. General Additive Model for Predictingmeasure for prediction error in random forest regressionRandom Forest Regression with sparse data in Python






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?



library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)


The results for the last Observation on the Volvo142E are similiar but not the same. Why?










share|cite|improve this question











$endgroup$











  • $begingroup$
    Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
    $endgroup$
    – gung
    Apr 26 at 19:42










  • $begingroup$
    I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 20:05

















2












$begingroup$


Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?



library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)


The results for the last Observation on the Volvo142E are similiar but not the same. Why?










share|cite|improve this question











$endgroup$











  • $begingroup$
    Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
    $endgroup$
    – gung
    Apr 26 at 19:42










  • $begingroup$
    I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 20:05













2












2








2





$begingroup$


Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?



library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)


The results for the last Observation on the Volvo142E are similiar but not the same. Why?










share|cite|improve this question











$endgroup$




Hi I am fairly new to Random Forest estimation, but I could not find a questions similiar to mine. I was surprised that the predictions are different using the same predictors. I would have expected the same. I understand, that the model would be different with each estimation, but getting different predictions for the same predictors?



library(randomForest)
set.seed(100)
df<-mtcars
rt.est<-randomForest(mpg ~ .,
data = df,
ntree=1000)
predict(rt.est)
df.double<-rbind(df,df[32,])
rt.est<-randomForest(mpg ~ .,
data = df.double,
ntree=1000)
predict(rt.est)


The results for the last Observation on the Volvo142E are similiar but not the same. Why?







r random-forest






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Apr 26 at 19:40









gung

110k34270540




110k34270540










asked Apr 26 at 13:59









Max MMax M

1134




1134











  • $begingroup$
    Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
    $endgroup$
    – gung
    Apr 26 at 19:42










  • $begingroup$
    I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 20:05
















  • $begingroup$
    Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
    $endgroup$
    – gung
    Apr 26 at 19:42










  • $begingroup$
    I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 20:05















$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung
Apr 26 at 19:42




$begingroup$
Once you have an answer to the question as stated, you shouldn't really change the substance of the question in a way that breaks the connection with the existing answers. If you wan to ask another question as a fork of this, just start a new thread. You can link back for context, if you want.
$endgroup$
– gung
Apr 26 at 19:42












$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
Apr 26 at 20:05




$begingroup$
I did not alter the questions substantially. It was always about having a different prediction for the same predictors.
$endgroup$
– Max M
Apr 26 at 20:05










1 Answer
1






active

oldest

votes


















5












$begingroup$

If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives



Volvo 142E 
22.15557

Volvo 142E1
22.15557


When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.



We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).



Volvo 142E 
22.83609

Volvo 142E1
22.85975


One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.




OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.



The trees are different.



First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)



Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.



Different trees make different predictions.



Having the same feature values is only half the battle. The other half is how the trees are constructed.



As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.



  1. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 5 as the best split.

  2. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 7 as the best split.

  3. This tree has a bootstrap resample and randomly samples wt, disp and hp. It picks splitting on hp at 123 as the best split.

Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 14:19











  • $begingroup$
    If you have two completely different decision trees, are they guaranteed to make the same predictions?
    $endgroup$
    – Sycorax
    Apr 26 at 14:41











  • $begingroup$
    I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
    $endgroup$
    – Max M
    Apr 26 at 18:42











  • $begingroup$
    Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
    $endgroup$
    – Max M
    Apr 26 at 18:54











  • $begingroup$
    No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
    $endgroup$
    – Sycorax
    Apr 26 at 18:55












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405234%2frandom-forest-different-results-for-same-observation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









5












$begingroup$

If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives



Volvo 142E 
22.15557

Volvo 142E1
22.15557


When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.



We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).



Volvo 142E 
22.83609

Volvo 142E1
22.85975


One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.




OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.



The trees are different.



First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)



Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.



Different trees make different predictions.



Having the same feature values is only half the battle. The other half is how the trees are constructed.



As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.



  1. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 5 as the best split.

  2. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 7 as the best split.

  3. This tree has a bootstrap resample and randomly samples wt, disp and hp. It picks splitting on hp at 123 as the best split.

Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 14:19











  • $begingroup$
    If you have two completely different decision trees, are they guaranteed to make the same predictions?
    $endgroup$
    – Sycorax
    Apr 26 at 14:41











  • $begingroup$
    I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
    $endgroup$
    – Max M
    Apr 26 at 18:42











  • $begingroup$
    Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
    $endgroup$
    – Max M
    Apr 26 at 18:54











  • $begingroup$
    No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
    $endgroup$
    – Sycorax
    Apr 26 at 18:55
















5












$begingroup$

If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives



Volvo 142E 
22.15557

Volvo 142E1
22.15557


When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.



We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).



Volvo 142E 
22.83609

Volvo 142E1
22.85975


One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.




OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.



The trees are different.



First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)



Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.



Different trees make different predictions.



Having the same feature values is only half the battle. The other half is how the trees are constructed.



As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.



  1. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 5 as the best split.

  2. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 7 as the best split.

  3. This tree has a bootstrap resample and randomly samples wt, disp and hp. It picks splitting on hp at 123 as the best split.

Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 14:19











  • $begingroup$
    If you have two completely different decision trees, are they guaranteed to make the same predictions?
    $endgroup$
    – Sycorax
    Apr 26 at 14:41











  • $begingroup$
    I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
    $endgroup$
    – Max M
    Apr 26 at 18:42











  • $begingroup$
    Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
    $endgroup$
    – Max M
    Apr 26 at 18:54











  • $begingroup$
    No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
    $endgroup$
    – Sycorax
    Apr 26 at 18:55














5












5








5





$begingroup$

If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives



Volvo 142E 
22.15557

Volvo 142E1
22.15557


When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.



We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).



Volvo 142E 
22.83609

Volvo 142E1
22.85975


One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.




OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.



The trees are different.



First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)



Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.



Different trees make different predictions.



Having the same feature values is only half the battle. The other half is how the trees are constructed.



As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.



  1. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 5 as the best split.

  2. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 7 as the best split.

  3. This tree has a bootstrap resample and randomly samples wt, disp and hp. It picks splitting on hp at 123 as the best split.

Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.






share|cite|improve this answer











$endgroup$



If you supply the argument newdata the discrepancy disappears: predict(rt.est, newdata=df) gives



Volvo 142E 
22.15557

Volvo 142E1
22.15557


When you do not supply newdata, it's probably reporting the out-of-bag results, but I haven't found an explicit clarification of this in the documentation. Samples that are "in-bag" were included in a tree during training as a result of the bootstrap re-sampling procedure; out-of-bag samples were omitted.



We can verify that this is out-of-bag data by calling rt.est$predicted which reports the out-of-bag predictions. The results match predict(rt.est).



Volvo 142E 
22.83609

Volvo 142E1
22.85975


One way to think of predict.randomForest is that it's a shortcut to the out-of-bag predictions unless you supply newdata.




OP originally asked about 2 different ensembles of random forests. This portion of the answer addresses why 2 random forest ensembles might make different predictions on the same data.



The trees are different.



First, randomForest is a random procedure: both the samples chosen for each tree are different (bootstrap resampling), and the features chosen at each split are chosen at random (randomized feature subspaces). Without fixing the random seed, we would expect two randomForest runs to produce different results with high probability for the same reason that flipping a fair coin 1000 times will plausibly result in a different sequence of heads and tails. (You have fixed the seed, however the two instances of randomForest will still be different because the random state is altered after the first randomForest is produced.)



Second, The data used to train the models is different. It appears that you've added an additional row. Different data makes for a different model, which makes for a different prediction. When randomForest conducts its bootstrapping, the probability that df[32,] is in-bag for that tree is larger than for each non-duplicated sample. This change to the data will also change the trees, because choices about where to make splits will be influenced by the increased prominence of this sample.



Different trees make different predictions.



Having the same feature values is only half the battle. The other half is how the trees are constructed.



As an example, suppose I have 3 trees constructed with the random forest procedure, each with 1 split.



  1. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 5 as the best split.

  2. This tree has a bootstrap resample and randomly samples cyl, disp and hp. It picks splitting on cyl at 7 as the best split.

  3. This tree has a bootstrap resample and randomly samples wt, disp and hp. It picks splitting on hp at 123 as the best split.

Clearly there will be different predictions whenever the splits change the decision of a sample. A sample with cyl 6 might go "right" for tree 1, but "left" for tree 2. Feature hp doesn't have a one-to-one relationship to cyl, so a split on hp won't generally match splits on cyl.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Apr 26 at 18:54

























answered Apr 26 at 14:14









SycoraxSycorax

43.5k13112208




43.5k13112208











  • $begingroup$
    That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 14:19











  • $begingroup$
    If you have two completely different decision trees, are they guaranteed to make the same predictions?
    $endgroup$
    – Sycorax
    Apr 26 at 14:41











  • $begingroup$
    I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
    $endgroup$
    – Max M
    Apr 26 at 18:42











  • $begingroup$
    Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
    $endgroup$
    – Max M
    Apr 26 at 18:54











  • $begingroup$
    No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
    $endgroup$
    – Sycorax
    Apr 26 at 18:55

















  • $begingroup$
    That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
    $endgroup$
    – Max M
    Apr 26 at 14:19











  • $begingroup$
    If you have two completely different decision trees, are they guaranteed to make the same predictions?
    $endgroup$
    – Sycorax
    Apr 26 at 14:41











  • $begingroup$
    I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
    $endgroup$
    – Max M
    Apr 26 at 18:42











  • $begingroup$
    Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
    $endgroup$
    – Max M
    Apr 26 at 18:54











  • $begingroup$
    No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
    $endgroup$
    – Sycorax
    Apr 26 at 18:55
















$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
Apr 26 at 14:19





$begingroup$
That i understand. I added one Observation ran the model again and would have expected the same result for the Observation #32 because it has the same predictors.
$endgroup$
– Max M
Apr 26 at 14:19













$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
Apr 26 at 14:41





$begingroup$
If you have two completely different decision trees, are they guaranteed to make the same predictions?
$endgroup$
– Sycorax
Apr 26 at 14:41













$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
Apr 26 at 18:42





$begingroup$
I edited my Code to make it clearer what confuses me. I do understand that each run of the randomforest command yields a different model if I do not set a seed, but why is the prediction for my last two observations different? So you are saying that the prediction can slighly deviate at the cutoff Point? But this would be more likely for categorical predictors then?
$endgroup$
– Max M
Apr 26 at 18:42













$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
Apr 26 at 18:54





$begingroup$
Oh thats surprising to me. So ist a using kind of a mixed random drawn sample to get the predictions for the model? Is this not contradicting your original answer then???
$endgroup$
– Max M
Apr 26 at 18:54













$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
$endgroup$
– Sycorax
Apr 26 at 18:55





$begingroup$
No, that's absolutely not what it's doing. Please read the explanation in my answer. The behavior of the function changes depending on what arguments you do or do not give to it. Your original question asked about a comparison between 2 random forest ensembles. Your revised question asks about how the function predict works.
$endgroup$
– Sycorax
Apr 26 at 18:55


















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405234%2frandom-forest-different-results-for-same-observation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020