Is it OK to use the testing sample to compare algorithms?Can I use the test dataset to select a model?Training Validation Testing set split for facial expression datasetSample selection through clusteringPossible Reason for low Test accuracy and high AUCOverfitted model produces similar AUC on test set, so which model do I go with?Hyperparameter tuning for stacked modelsHyper-parameter tuning when you don't have an access to the test dataCan I use the test dataset to select a model?Oversampling before Cross-Validation, is it a problem?How to plan a model analysis that avoids overfitting?Supervised multiclass classification : is ANN a good idea ? or use other classifiers?
Where did the extra Pym particles come from in Endgame?
In the time of the mishna, were there Jewish cities without courts?
How to set the font color of quantity objects (Version 11.3 vs version 12)?
Do I have an "anti-research" personality?
Reverse the word in a string with the same order in javascript
Pawn Sacrifice Justification
Were there two appearances of Stan Lee?
When and why did journal article titles become descriptive, rather than creatively allusive?
Does the EU Common Fisheries Policy cover British Overseas Territories?
Are some sounds more pleasing to the ear, like ㄴ and ㅁ?
If Earth is tilted, why is Polaris always above the same spot?
Has any spacecraft ever had the ability to directly communicate with civilian air traffic control?
Was it really necessary for the Lunar Module to have 2 stages?
Will tsunami waves travel forever if there was no land?
Why was Germany not as successful as other Europeans in establishing overseas colonies?
"ne paelici suspectaretur" (Tacitus)
You look catfish vs You look like a catfish
Single Colour Mastermind Problem
Confusion about capacitors
Weird result in complex limit
Sci-fi novel series with instant travel between planets through gates. A river runs through the gates
Minimum value of 4 digit number divided by sum of its digits
What's the metal clinking sound at the end of credits in Avengers: Endgame?
Why is current rating for multicore cable lower than single core with the same cross section?
Is it OK to use the testing sample to compare algorithms?
Can I use the test dataset to select a model?Training Validation Testing set split for facial expression datasetSample selection through clusteringPossible Reason for low Test accuracy and high AUCOverfitted model produces similar AUC on test set, so which model do I go with?Hyperparameter tuning for stacked modelsHyper-parameter tuning when you don't have an access to the test dataCan I use the test dataset to select a model?Oversampling before Cross-Validation, is it a problem?How to plan a model analysis that avoids overfitting?Supervised multiclass classification : is ANN a good idea ? or use other classifiers?
$begingroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV
, how can I prevent it to overfit?
machine-learning scikit-learn sampling
$endgroup$
add a comment |
$begingroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV
, how can I prevent it to overfit?
machine-learning scikit-learn sampling
$endgroup$
3
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49
add a comment |
$begingroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV
, how can I prevent it to overfit?
machine-learning scikit-learn sampling
$endgroup$
I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.
Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.
As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).
My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.
Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV
, how can I prevent it to overfit?
machine-learning scikit-learn sampling
machine-learning scikit-learn sampling
asked Apr 21 at 16:15
Dan ChaltielDan Chaltiel
1506
1506
3
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49
add a comment |
3
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49
3
3
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
$endgroup$
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (likesklearn
'sGridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
1
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
|
show 2 more comments
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
$endgroup$
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (likesklearn
'sGridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
1
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
|
show 2 more comments
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
$endgroup$
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (likesklearn
'sGridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
1
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
|
show 2 more comments
$begingroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
$endgroup$
Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.
The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.
If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.
answered Apr 21 at 19:56
Cameron KingCameron King
662
662
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (likesklearn
'sGridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
1
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
|
show 2 more comments
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (likesklearn
'sGridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
1
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (like
sklearn
's GridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (like
sklearn
's GridSearchCV
)? Doing it by hand seems quite tedious and error-prone to me.$endgroup$
– Dan Chaltiel
Apr 22 at 5:22
1
1
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49
|
show 2 more comments
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
add a comment |
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
add a comment |
$begingroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
$endgroup$
No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.
To compare algorithms you instead set aside another chunk of your data called the validation set.
Here is some info about good splits depending on data size:
Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.
(Andrew uses the word dev set instead of validation set)
edited Apr 21 at 16:32
answered Apr 21 at 16:18
Simon LarssonSimon Larsson
1,160216
1,160216
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
add a comment |
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26
$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49