Distribution normality checkT-test for non normal when N>50?How do I compare sample means in this experimental-control group study?Why does a goodness of fit test use the chi square distribution rather than the hypothesised distribution?Do I have to standardize my data to calculate variance?If one of my samples has zero variance, can I perform an ANOVA or are pairwise one-sample t-test's more appropriate?Check if a difference between paired means within two populations is constantAUC values for different sets of featuresT-test between groups while removing within group variabilityHow can Welch's t-test be expressed in lme4paired-samples or independent samples t-test?Omnibus 1-sample t-test?

Is a diamond sword feasible?

On studying Computer Science vs. Software Engineering to become a proficient coder

Are there variations of the regular runtimes of the Big-O-Notation?

Guns in space with bullets that return?

Early arrival in Australia, early hotel check in not available

Pre-1993 comic in which Wolverine's claws were turned to rubber?

How can this pool heater gas line be disconnected?

Cropping a message using array splits

How to make the table in the figure in LaTeX?

Set a camera to free fall like a Rigid Body?

What are some possible reasons that a father's name is missing from a birth certificate - England?

Can 'sudo apt-get remove [write]' destroy my Ubuntu?

How can I answer high-school writing prompts without sounding weird and fake?

Why do unstable nuclei form?

Drawing Quarter-Circle

What does i386 mean on macOS Mojave?

How to make a language evolve quickly?

Would an 8% reduction in drag outweigh the weight addition from this custom CFD-tested winglet?

International Code of Ethics for order of co-authors in research papers

Is there a faster way to calculate Abs[z]^2 numerically?

Why does a C.D.F need to be right-continuous?

Was the Highlands Ranch shooting the 115th mass shooting in the US in 2019

Limit of an integral vs Limit of the integrand

Best species to breed to intelligence



Distribution normality check


T-test for non normal when N>50?How do I compare sample means in this experimental-control group study?Why does a goodness of fit test use the chi square distribution rather than the hypothesised distribution?Do I have to standardize my data to calculate variance?If one of my samples has zero variance, can I perform an ANOVA or are pairwise one-sample t-test's more appropriate?Check if a difference between paired means within two populations is constantAUC values for different sets of featuresT-test between groups while removing within group variabilityHow can Welch's t-test be expressed in lme4paired-samples or independent samples t-test?Omnibus 1-sample t-test?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








5












$begingroup$


I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.










share|cite|improve this question











$endgroup$











  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16

















5












$begingroup$


I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.










share|cite|improve this question











$endgroup$











  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16













5












5








5


2



$begingroup$


I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.










share|cite|improve this question











$endgroup$




I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.







hypothesis-testing self-study t-test






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited May 1 at 11:10









COOLSerdash

16.8k75395




16.8k75395










asked May 1 at 11:01









Anastasiia ZakarianAnastasiia Zakarian

262




262











  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16
















  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16















$begingroup$
Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
$endgroup$
– whuber
May 2 at 14:16




$begingroup$
Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
$endgroup$
– whuber
May 2 at 14:16










2 Answers
2






active

oldest

votes


















5












$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23


















2












$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$








  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f406028%2fdistribution-normality-check%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









5












$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23















5












$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23













5












5








5





$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$



The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited May 1 at 16:04

























answered May 1 at 15:37









BruceETBruceET

7,7011721




7,7011721











  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23
















  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23















$begingroup$
(+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
$endgroup$
– stats.and.r
May 1 at 16:23




$begingroup$
(+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
$endgroup$
– stats.and.r
May 1 at 16:23













2












$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$








  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09















2












$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$








  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09













2












2








2





$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$



@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited May 2 at 14:07

























answered May 1 at 12:10









stats.and.rstats.and.r

1




1







  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09












  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09







1




1




$begingroup$
(+1) for pointing out self-contradictory phrase and for useful links.
$endgroup$
– BruceET
May 1 at 15:38




$begingroup$
(+1) for pointing out self-contradictory phrase and for useful links.
$endgroup$
– BruceET
May 1 at 15:38












$begingroup$
re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
$endgroup$
– Glen_b
May 2 at 4:26





$begingroup$
re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
$endgroup$
– Glen_b
May 2 at 4:26













$begingroup$
@Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
$endgroup$
– stats.and.r
May 2 at 11:15




$begingroup$
@Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
$endgroup$
– stats.and.r
May 2 at 11:15












$begingroup$
In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
$endgroup$
– Glen_b
May 2 at 13:02





$begingroup$
In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
$endgroup$
– Glen_b
May 2 at 13:02













$begingroup$
@Glen_b: Now I understand what you mean with b). Thank you!
$endgroup$
– stats.and.r
May 2 at 13:09




$begingroup$
@Glen_b: Now I understand what you mean with b). Thank you!
$endgroup$
– stats.and.r
May 2 at 13:09

















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f406028%2fdistribution-normality-check%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020