AIC for increasing sample size The 2019 Stack Overflow Developer Survey Results Are InPositive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood
JSON.serialize: is it possible to suppress null values of a map?
Access elements in std::string where positon of string is greater than its size
Inflated grade on resume at previous job, might former employer tell new employer?
How was Skylab's orbit inclination chosen?
Is flight data recorder erased after every flight?
Lethal sonic weapons
What does "sndry explns" mean in one of the Hitchhiker's guide books?
How to reverse every other sublist of a list?
Time travel alters history but people keep saying nothing's changed
Is domain driven design an anti-SQL pattern?
Idomatic way to prevent slicing?
Are there any other methods to apply to solving simultaneous equations?
Deadlock Graph and Interpretation, solution to avoid
Should I write numbers in words or as symbols in this case?
What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?
"To split hairs" vs "To be pedantic"
What is the best strategy for white in this position?
Which Sci-Fi work first showed weapon of galactic-scale mass destruction?
Inline version of a function returns different value then non-inline version
Falsification in Math vs Science
Limit the amount of RAM Mathematica may access?
Why is the maximum length of openwrt’s root password 8 characters?
Does it makes sense to buy a new cycle to learn riding?
Is this food a bread or a loaf?
AIC for increasing sample size
The 2019 Stack Overflow Developer Survey Results Are InPositive log likelihood values and penalty of more complex models when ranking models using AICUsing AIC, for model selection when both models are equally weighted, and one model has fewer parametersSparse parameters when computing AIC, BIC, etcAIC, BIC and GCV: what is best for making decision in penalized regression methods?Comparison of log-likelihood of two non-nested modelsAIC, model selection and overfittingCan we use AIC to compare two GLMs when the scale parameter is estimated separately for each?Determination of maximum log-likelihood of nonlinear model for calculation of Aikaike ICAIC formula in R vs PythonAIC Calculation using log likelihood
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22
add a comment |
$begingroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I am using AIC as a model selection criteria in one of my projects. However, since AIC isn't dependent on the number of points sampled, for large n the log likelihood term rapidly outscales the parameter penalty.
I was wondering why the parameter penalty doesn't scale with the number of points, as the log likelihood generally does. It's getting to where the log likelihood is in the order of tens of thousands and the AIC penalty for having ~10 extra parameters in the model doesn't matter. But it feels like it really should. Am I misunderstanding something?
model-selection aic asymptotics log-likelihood
model-selection aic asymptotics log-likelihood
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited Apr 5 at 12:37
Richard Hardy
28.2k642129
28.2k642129
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Apr 5 at 12:20
JasonJason
183
183
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Jason is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22
add a comment |
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
add a comment |
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
add a comment |
$begingroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
$endgroup$
It's a known criticism of AIC.
The BIC scales the penalty of number of model parameters by the root of n. In even larger sample sizes,
$$ textBIC = log(n) k - 2 log mathcalL,$$
though you will still tend to find BIC favors models with more parameters in larger samples. In either case, it's a desirable trait of model selection criteria that tends to select more parameters in larger sample sizes. It all boils down to how many you want to enter into a particular model for a particular sample size. When that's a finite number, there's no reason to use information criteria at all.
Shibata's work on AIC works under the concept of "mean efficiency". That is: ICs work under the condition that you know or assume that the number of variables in an ideal model is infinitely valued, and that in larger samples you will tend to favor models with more variables.
edited Apr 5 at 15:11
answered Apr 5 at 14:46
AdamOAdamO
34.5k264142
34.5k264142
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
add a comment |
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
You can criticize a hammer if your problem does not look like a nail, but I wonder if there is any ground for criticizing the design of AIC taking into account what it actually aims for. After all, AIC is an efficient model selection criterion, which BIC and other criteria with relatively fast increasing penalties are not. So if your goal is optimal prediction (optimal in terms of maximizing the likelihood of a new observation), AIC will do it for you. If your goal is not prediction, why would you be considering AIC to begin with? Does that make sense?
$endgroup$
– Richard Hardy
Apr 5 at 15:19
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
OK, I guess you can justify your criticism of the assumption of infinitely many parameters in the "ideal model", as you mention in your last paragraph. So then the question would be, does my problem look like one where this assumption may hold or not? If so, AIC is fine, if not, go look for another information criterion.
$endgroup$
– Richard Hardy
Apr 5 at 15:26
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
@RichardHardy We agree on all points. The revelation that AIC only works in some very contrived situations won't stop people from asking whether it functions well in other situations. The answer, aside from "it wasn't meant to do that" is "it doesn't do that very well". It's a revelation that another inappropriate tool (BIC) "does it a bit better". There are much, much better tools for data reduction if OP wants a "sparse number of predictors in a reasonably large sample", but it wasn't the question that was asked.
$endgroup$
– AdamO
Apr 5 at 15:41
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
$begingroup$
Good. I would contest, however, your use of "very contrived situations", or even "contrived situations". A large (the largest?) part of real world phenomena are results of infinitely complex data generating processes which require an infinite amount of parameters to be fully charaterized, which is exactly what the premise of AIC is. Hence, as long as the goal is optimal prediction, AIC strikes me as the most reasonable choice, or at least a solid baseline. When prediction is not the goal while, say, finding a sparse number of predictors is, we need other tools.
$endgroup$
– Richard Hardy
Apr 5 at 16:35
add a comment |
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Jason is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401363%2faic-for-increasing-sample-size%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Why would having 10 extra parameters matter if you have enough data to estimate them rather precisely? AIC/n (AIC per datapoint) estimates the log-likelihood of a new data point from the same population; when you have enough data, this is approximately equal to the average sample likelihood (log-likelihood/n) as the estimation error for the parameters is negligible.
$endgroup$
– Richard Hardy
Apr 5 at 13:19
$begingroup$
Sorry, I don't think I articulated my question very well. Let's say you have many points of somewhat noisy data. Adding a decent number of parameters (lets stay 10) to your model will likely be very beneficial to your log likelihood. However, the -2k part of the AIC calculation will barely penalize the model for it. It just seems to me that the AIC doesn't appropriately penalize for extra params.
$endgroup$
– Jason
Apr 5 at 13:51
$begingroup$
In my comment above, it should be negative likelihood, not raw likelihood.
$endgroup$
– Richard Hardy
Apr 5 at 15:22