AIC for increasing sample size The 2019 Stack Overflow Developer Survey Results Are InPositive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood

JSON.serialize: is it possible to suppress null values of a map?

Access elements in std::string where positon of string is greater than its size

Inflated grade on resume at previous job, might former employer tell new employer?

How was Skylab's orbit inclination chosen?

Is flight data recorder erased after every flight?

Lethal sonic weapons

What does "sndry explns" mean in one of the Hitchhiker's guide books?

How to reverse every other sublist of a list?

Time travel alters history but people keep saying nothing's changed

Is domain driven design an anti-SQL pattern?

Idomatic way to prevent slicing?

Are there any other methods to apply to solving simultaneous equations?

Deadlock Graph and Interpretation, solution to avoid

Should I write numbers in words or as symbols in this case?

What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?

"To split hairs" vs "To be pedantic"

What is the best strategy for white in this position?

Which Sci-Fi work first showed weapon of galactic-scale mass destruction?

Inline version of a function returns different value then non-inline version

Falsification in Math vs Science

Limit the amount of RAM Mathematica may access?

Why is the maximum length of openwrt’s root password 8 characters?

Does it makes sense to buy a new cycle to learn riding?

Is this food a bread or a loaf?



AIC for increasing sample size



The 2019 Stack Overflow Developer Survey Results Are InPositive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








3












$begingroup$


I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.



I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?










share|cite|improve this question









New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
    $endgroup$
    – Richard Hardy
    Apr 5 at 13:19











  • $begingroup$
    Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
    $endgroup$
    – Jason
    Apr 5 at 13:51










  • $begingroup$
    In my comment above, it should be negative likelihood, not raw likelihood.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:22

















3












$begingroup$


I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.



I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?










share|cite|improve this question









New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
    $endgroup$
    – Richard Hardy
    Apr 5 at 13:19











  • $begingroup$
    Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
    $endgroup$
    – Jason
    Apr 5 at 13:51










  • $begingroup$
    In my comment above, it should be negative likelihood, not raw likelihood.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:22













3












3








3





$begingroup$


I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.



I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?










share|cite|improve this question









New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.



I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?







model-selection aic asymptotics log-likelihood






share|cite|improve this question









New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited Apr 5 at 12:37









Richard Hardy

28.2k642129




28.2k642129






New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Apr 5 at 12:20









JasonJason

183




183




New contributor




Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • $begingroup$
    Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
    $endgroup$
    – Richard Hardy
    Apr 5 at 13:19











  • $begingroup$
    Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
    $endgroup$
    – Jason
    Apr 5 at 13:51










  • $begingroup$
    In my comment above, it should be negative likelihood, not raw likelihood.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:22
















  • $begingroup$
    Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
    $endgroup$
    – Richard Hardy
    Apr 5 at 13:19











  • $begingroup$
    Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
    $endgroup$
    – Jason
    Apr 5 at 13:51










  • $begingroup$
    In my comment above, it should be negative likelihood, not raw likelihood.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:22















$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19





$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19













$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51




$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51












$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22




$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22










1 Answer
1






active

oldest

votes


















3












$begingroup$

It's a known criticism of AIC.



The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,



$$ textBIC = log(n) k - 2 log mathcalL,$$



though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.



Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:19











  • $begingroup$
    OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:26











  • $begingroup$
    @RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
    $endgroup$
    – AdamO
    Apr 5 at 15:41











  • $begingroup$
    Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
    $endgroup$
    – Richard Hardy
    Apr 5 at 16:35











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Jason is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$

It's a known criticism of AIC.



The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,



$$ textBIC = log(n) k - 2 log mathcalL,$$



though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.



Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:19











  • $begingroup$
    OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:26











  • $begingroup$
    @RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
    $endgroup$
    – AdamO
    Apr 5 at 15:41











  • $begingroup$
    Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
    $endgroup$
    – Richard Hardy
    Apr 5 at 16:35















3












$begingroup$

It's a known criticism of AIC.



The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,



$$ textBIC = log(n) k - 2 log mathcalL,$$



though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.



Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:19











  • $begingroup$
    OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:26











  • $begingroup$
    @RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
    $endgroup$
    – AdamO
    Apr 5 at 15:41











  • $begingroup$
    Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
    $endgroup$
    – Richard Hardy
    Apr 5 at 16:35













3












3








3





$begingroup$

It's a known criticism of AIC.



The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,



$$ textBIC = log(n) k - 2 log mathcalL,$$



though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.



Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.






share|cite|improve this answer











$endgroup$



It's a known criticism of AIC.



The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,



$$ textBIC = log(n) k - 2 log mathcalL,$$



though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.



Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Apr 5 at 15:11

























answered Apr 5 at 14:46









AdamOAdamO

34.5k264142




34.5k264142











  • $begingroup$
    You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:19











  • $begingroup$
    OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:26











  • $begingroup$
    @RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
    $endgroup$
    – AdamO
    Apr 5 at 15:41











  • $begingroup$
    Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
    $endgroup$
    – Richard Hardy
    Apr 5 at 16:35
















  • $begingroup$
    You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:19











  • $begingroup$
    OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
    $endgroup$
    – Richard Hardy
    Apr 5 at 15:26











  • $begingroup$
    @RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
    $endgroup$
    – AdamO
    Apr 5 at 15:41











  • $begingroup$
    Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
    $endgroup$
    – Richard Hardy
    Apr 5 at 16:35















$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19





$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19













$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26





$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26













$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41





$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41













$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35




$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35










Jason is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















Jason is a new contributor. Be nice, and check out our Code of Conduct.












Jason is a new contributor. Be nice, and check out our Code of Conduct.











Jason is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to write a 12-bar blues melodyI-IV-V blues progressionHow to play the bridges in a standard blues progressionHow does Gdim7 fit in C# minor?question on a certain chord progressionMusicology of Melody12 bar blues, spread rhythm: alternative to 6th chord to avoid finger stretchChord progressions/ Root key/ MelodiesHow to put chords (POP-EDM) under a given lead vocal melody (starting from a good knowledge in music theory)Are there “rules” for improvising with the minor pentatonic scale over 12-bar shuffle?Confusion about blues scale and chords

What if the end-user didn't have the required library?What is setup.py?What is a clean, pythonic way to have multiple constructors in Python?What does Ruby have that Python doesn't, and vice versa?What is the reason for having '//' in Python?How do I create a namespace package in Python?How to package shared objects that python modules depend on?setuptools vs. distutils: why is distutils still a thing?Navigation in Windows 10 vs code not going to virtualenv library when the same library is installed at user levelPython create package for local usePackaging a project that uses multiple python versionsWhy is permission denied on pip install except for when “--user” is included at end of command?

Why did Thanos need his ship to help him in the battle scene?Which actor plays Thanos in the Avengers mid-credits scene?Are there economic implications portrayed in comics where the buildings and cities are ruined almost daily?Old X-Men comic where team travels to alien world with a ring-like sun that needs recharging?Why does Ego need help sleeping?Is there an objective answer to who “the strongest Avenger” is?How did Banner get unstuck?Why did Thanos get hit?How did Thanos (or anyone) know the Infinity Stones would give him this power?Did Thanos leave Eitri alive for his after-sales service?In Avengers 1, why does Thanos need Loki?