Is it OK to use the testing sample to compare algorithms?Can I use the test dataset to select a model?Training Validation Testing set split for facial expression datasetSample selection through clusteringPossible Reason for low Test accuracy and high AUCOverfitted model produces similar AUC on test set, so which model do I go with?Hyperparameter tuning for stacked modelsHyper-parameter tuning when you don't have an access to the test dataCan I use the test dataset to select a model?Oversampling before Cross-Validation, is it a problem?How to plan a model analysis that avoids overfitting?Supervised multiclass classification : is ANN a good idea ? or use other classifiers?

Where did the extra Pym particles come from in Endgame?

In the time of the mishna, were there Jewish cities without courts?

How to set the font color of quantity objects (Version 11.3 vs version 12)?

Do I have an "anti-research" personality?

Reverse the word in a string with the same order in javascript

Pawn Sacrifice Justification

Were there two appearances of Stan Lee?

When and why did journal article titles become descriptive, rather than creatively allusive?

Does the EU Common Fisheries Policy cover British Overseas Territories?

Are some sounds more pleasing to the ear, like ㄴ and ㅁ?

If Earth is tilted, why is Polaris always above the same spot?

Has any spacecraft ever had the ability to directly communicate with civilian air traffic control?

Was it really necessary for the Lunar Module to have 2 stages?

Will tsunami waves travel forever if there was no land?

Why was Germany not as successful as other Europeans in establishing overseas colonies?

"ne paelici suspectaretur" (Tacitus)

You look catfish vs You look like a catfish

Single Colour Mastermind Problem

Confusion about capacitors

Weird result in complex limit

Sci-fi novel series with instant travel between planets through gates. A river runs through the gates

Minimum value of 4 digit number divided by sum of its digits

What's the metal clinking sound at the end of credits in Avengers: Endgame?

Why is current rating for multicore cable lower than single core with the same cross section?



Is it OK to use the testing sample to compare algorithms?


Can I use the test dataset to select a model?Training Validation Testing set split for facial expression datasetSample selection through clusteringPossible Reason for low Test accuracy and high AUCOverfitted model produces similar AUC on test set, so which model do I go with?Hyperparameter tuning for stacked modelsHyper-parameter tuning when you don't have an access to the test dataCan I use the test dataset to select a model?Oversampling before Cross-Validation, is it a problem?How to plan a model analysis that avoids overfitting?Supervised multiclass classification : is ANN a good idea ? or use other classifiers?













3












$begingroup$


I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?










share|improve this question









$endgroup$







  • 3




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    Apr 21 at 18:26










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 19:49















3












$begingroup$


I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?










share|improve this question









$endgroup$







  • 3




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    Apr 21 at 18:26










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 19:49













3












3








3





$begingroup$


I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?










share|improve this question









$endgroup$




I'm working on a little project where my dataset have 6k lines and around 300 features, with a simple binary outcome.



Since I'm still learning ML, I want to try all the algorithms I can manage to find and compare the results.



As I've read in tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then trained my algorithms on the training sample with cross-validation (5 folds).



My plan is to train all my models this way, and then measure their performance on the testing sample to chose the best algorithm.



Could this cause overfitting? If so, since I cannot compare several models inside model_selection.GridSearchCV, how can I prevent it to overfit?







machine-learning scikit-learn sampling






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 21 at 16:15









Dan ChaltielDan Chaltiel

1506




1506







  • 3




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    Apr 21 at 18:26










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 19:49












  • 3




    $begingroup$
    Possible duplicate of Can I use the test dataset to select a model?
    $endgroup$
    – Ben Reiniger
    Apr 21 at 18:26










  • $begingroup$
    @BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 19:49







3




3




$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26




$begingroup$
Possible duplicate of Can I use the test dataset to select a model?
$endgroup$
– Ben Reiniger
Apr 21 at 18:26












$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49




$begingroup$
@BenReiniger You are right, this is quite the same question, but I like Simon's answer better.
$endgroup$
– Dan Chaltiel
Apr 21 at 19:49










2 Answers
2






active

oldest

votes


















3












$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer









$endgroup$












  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 20:01










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    Apr 21 at 20:48











  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    Apr 21 at 20:59










  • $begingroup$
    This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
    $endgroup$
    – Dan Chaltiel
    Apr 22 at 5:22






  • 1




    $begingroup$
    Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
    $endgroup$
    – Cameron King
    Apr 22 at 19:49



















1












$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$












  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 16:22










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:25










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:28











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer









$endgroup$












  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 20:01










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    Apr 21 at 20:48











  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    Apr 21 at 20:59










  • $begingroup$
    This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
    $endgroup$
    – Dan Chaltiel
    Apr 22 at 5:22






  • 1




    $begingroup$
    Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
    $endgroup$
    – Cameron King
    Apr 22 at 19:49
















3












$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer









$endgroup$












  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 20:01










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    Apr 21 at 20:48











  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    Apr 21 at 20:59










  • $begingroup$
    This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
    $endgroup$
    – Dan Chaltiel
    Apr 22 at 5:22






  • 1




    $begingroup$
    Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
    $endgroup$
    – Cameron King
    Apr 22 at 19:49














3












3








3





$begingroup$

Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.






share|improve this answer









$endgroup$



Basically, every time you use the results of a train/test split to make decisions about a model- whether that's tuning the hyperparameters of a single model, or choosing the most effective of a number of different models, you cannot infer anything about the performance of the model after making those decisions until you have "frozen" your model and evaluated it on a portion of data that has not been touched.



The general concept addressing this issue is called nested cross validation. If you use a train/test split to choose the best parameters for a model, that's fine. But if you want to estimate the performance of that, you need to then evaluate on a second held out set.



If you then repeat process for multiple models and choose the best performing one, again, that's fine, but by choosing the best result the value of your performance metric is inherently biased, and you need to validate the entire procedure on yet another held out set to get an unbiased estimate of how your model will perform on unseen data.







share|improve this answer












share|improve this answer



share|improve this answer










answered Apr 21 at 19:56









Cameron KingCameron King

662




662











  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 20:01










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    Apr 21 at 20:48











  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    Apr 21 at 20:59










  • $begingroup$
    This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
    $endgroup$
    – Dan Chaltiel
    Apr 22 at 5:22






  • 1




    $begingroup$
    Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
    $endgroup$
    – Cameron King
    Apr 22 at 19:49

















  • $begingroup$
    Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 20:01










  • $begingroup$
    With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
    $endgroup$
    – Cameron King
    Apr 21 at 20:48











  • $begingroup$
    I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
    $endgroup$
    – Cameron King
    Apr 21 at 20:59










  • $begingroup$
    This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
    $endgroup$
    – Dan Chaltiel
    Apr 22 at 5:22






  • 1




    $begingroup$
    Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
    $endgroup$
    – Cameron King
    Apr 22 at 19:49
















$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01




$begingroup$
Great answer too! So since this seems like an order 2 cross-validation (a cross-validation of cross-validations), should I pool my samples (70+15 according to Simon's answer) before I evaluate my final algorithm on the test sample?
$endgroup$
– Dan Chaltiel
Apr 21 at 20:01












$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48





$begingroup$
With so few samples and features (relatively speaking) personally I would use multiple rounds of stratified k-fold cross validation, rather than holding a fixed set or sets out. The theoretical guarantees of evaluating on held out data only hold in the limit of the number of samples. I'm working under the assumption that training and testing your model is not a huge deal in terms of time. I would do something like shuffling the samples within each class and setting aside each fold in turn, and for the inner loop, combine it all, split again, do your model selection, then yes, pool.
$endgroup$
– Cameron King
Apr 21 at 20:48













$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59




$begingroup$
I forgot to mention here, this means that you are setting aside, say, 1/k points to test on, and dividing the 1-1/k remaining points into an inner cross validation run, but repeating this so that you are training, validating, and testing on every single data point at least once. With larger datasets and more time consuming models, deciding on splitting the data once into 3 partitions makes sense, but it will always be less robust than k-fold cross validation. develop with that method to save yourself time, that's a good idea, but when it comes to making decisions, don't cut corners.
$endgroup$
– Cameron King
Apr 21 at 20:59












$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22




$begingroup$
This makes perfect sense, but have you ever heard of a standard way to automate this (like sklearn's GridSearchCV)? Doing it by hand seems quite tedious and error-prone to me.
$endgroup$
– Dan Chaltiel
Apr 22 at 5:22




1




1




$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49





$begingroup$
Sklearn can absolutely do this, and their examples are pretty good for understanding how this is going to go for you. The link at the end uses KFold, and if you look at thedocs for that class you can set shuffle=True to randomize the splits. scikit-learn.org/stable/auto_examples/model_selection/…
$endgroup$
– Cameron King
Apr 22 at 19:49












1












$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$












  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 16:22










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:25










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:28















1












$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$












  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 16:22










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:25










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:28













1












1








1





$begingroup$

No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)






share|improve this answer











$endgroup$



No, that is not the purpose of the test set. Test set is only for final evaluation when your model is done. The problem is that if you include the test set in your decisions your evaluation will no longer be reliable.



To compare algorithms you instead set aside another chunk of your data called the validation set.



Here is some info about good splits depending on data size:



Train / Dev / Test sets from Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization by Prof. Andrew Ng.



(Andrew uses the word dev set instead of validation set)







share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 21 at 16:32

























answered Apr 21 at 16:18









Simon LarssonSimon Larsson

1,160216




1,160216











  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 16:22










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:25










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:28
















  • $begingroup$
    I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
    $endgroup$
    – Dan Chaltiel
    Apr 21 at 16:22










  • $begingroup$
    Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:25










  • $begingroup$
    I added a video on the subject.
    $endgroup$
    – Simon Larsson
    Apr 21 at 16:28















$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22




$begingroup$
I thought so but I couldn't read anything about this. Could you provide some material on which I could learn? Like about what a common splitting would be (70/10/20) ?
$endgroup$
– Dan Chaltiel
Apr 21 at 16:22












$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25




$begingroup$
Depends on the size of your dataset. But I would say 70/15/15 would be good in your case.
$endgroup$
– Simon Larsson
Apr 21 at 16:25












$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28




$begingroup$
I added a video on the subject.
$endgroup$
– Simon Larsson
Apr 21 at 16:28

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49683%2fis-it-ok-to-use-the-testing-sample-to-compare-algorithms%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020