Distribution normality checkT-test for non normal when N>50?How do I compare sample means in this experimental-control group study?Why does a goodness of fit test use the chi square distribution rather than the hypothesised distribution?Do I have to standardize my data to calculate variance?If one of my samples has zero variance, can I perform an ANOVA or are pairwise one-sample t-test's more appropriate?Check if a difference between paired means within two populations is constantAUC values for different sets of featuresT-test between groups while removing within group variabilityHow can Welch's t-test be expressed in lme4paired-samples or independent samples t-test?Omnibus 1-sample t-test?

Is a diamond sword feasible?

On studying Computer Science vs. Software Engineering to become a proficient coder

Are there variations of the regular runtimes of the Big-O-Notation?

Guns in space with bullets that return?

Early arrival in Australia, early hotel check in not available

Pre-1993 comic in which Wolverine's claws were turned to rubber?

How can this pool heater gas line be disconnected?

Cropping a message using array splits

How to make the table in the figure in LaTeX?

Set a camera to free fall like a Rigid Body?

What are some possible reasons that a father's name is missing from a birth certificate - England?

Can 'sudo apt-get remove [write]' destroy my Ubuntu?

How can I answer high-school writing prompts without sounding weird and fake?

Why do unstable nuclei form?

Drawing Quarter-Circle

What does i386 mean on macOS Mojave?

How to make a language evolve quickly?

Would an 8% reduction in drag outweigh the weight addition from this custom CFD-tested winglet?

International Code of Ethics for order of co-authors in research papers

Is there a faster way to calculate Abs[z]^2 numerically?

Why does a C.D.F need to be right-continuous?

Was the Highlands Ranch shooting the 115th mass shooting in the US in 2019

Limit of an integral vs Limit of the integrand

Best species to breed to intelligence



Distribution normality check


T-test for non normal when N>50?How do I compare sample means in this experimental-control group study?Why does a goodness of fit test use the chi square distribution rather than the hypothesised distribution?Do I have to standardize my data to calculate variance?If one of my samples has zero variance, can I perform an ANOVA or are pairwise one-sample t-test's more appropriate?Check if a difference between paired means within two populations is constantAUC values for different sets of featuresT-test between groups while removing within group variabilityHow can Welch's t-test be expressed in lme4paired-samples or independent samples t-test?Omnibus 1-sample t-test?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








5












$begingroup$


I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.










share|cite|improve this question











$endgroup$











  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16

















5












$begingroup$


I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.










share|cite|improve this question











$endgroup$











  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16













5












5








5


2



$begingroup$


I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.










share|cite|improve this question











$endgroup$




I can not solve the problem from my homework.




We conducted two experiments. In the first, there were 400 patients,
and in the second, 250. In these experiments, the effects of various
drugs were evaluated. The average weight of people in the two groups
was compared using the t-test. The test of normality for the first
group gave a p-value below the significance threshold, and for the
second above the significance threshold (but the histogram was
bell-shaped). Variance in groups differed by more than 30%. Is it
possible to say that the experiments were compared incorrectly?




There are answer answers to choose, only one of them is correct:



  1. everything is bad, the distribution is not normal, we need a Wilcoxon test

  2. everything is bad, the samples have different variance and sizes, you cannot use the t-test

  3. the samples are large enough so that everything is fine

  4. everything is fine, p-value of one of the groups is larger than the threshold, so the result of the first group could be random, then the distribution is still normal

Personally, I think that the correct answer is the second, because the condition for the applicability of the t-test is homogeneity of variance, and in these experiments it is very different. But I'm not sure that this is the right answer.







hypothesis-testing self-study t-test






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited May 1 at 11:10









COOLSerdash

16.8k75395




16.8k75395










asked May 1 at 11:01









Anastasiia ZakarianAnastasiia Zakarian

262




262











  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16
















  • $begingroup$
    Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
    $endgroup$
    – whuber
    May 2 at 14:16















$begingroup$
Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
$endgroup$
– whuber
May 2 at 14:16




$begingroup$
Although one of these answers may be correct, it is impossible to determine that from the information given, and it is conceivable that any one of the answers could be "the" correct one, depending on the details of the data and the experimental protocol.
$endgroup$
– whuber
May 2 at 14:16










2 Answers
2






active

oldest

votes


















5












$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23


















2












$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$








  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f406028%2fdistribution-normality-check%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









5












$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23















5












$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23













5












5








5





$begingroup$

The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.






share|cite|improve this answer











$endgroup$



The question is very poorly constructed and contains some serious flaws. The context of the question may help. Looking at the 'rules' and 'guidelines' for two-sample t tests that have been given just previously to the question may help you figure out what the author means.



Major flaws are as follows:



  • "[T]he second [P-value is ] above the significance threshold (but the histogram was bell-shaped)." I agree with @statsandr (+1) that this seems self-contradictory.


  • "Variance in groups differed by more than 30%." The appropriate way to judge whether sample variances indicate the population variances may be unequal is to look at their ratio, not their difference.


  • Nothing is said about the difference in sample means and no clue is given how large a difference would be of practical importance. So, against what standard are we to judge an "incorrect" comparison?


Also, we don't know whether the two-sample t test under discussion is a 'pooled' or a 'Welch' test. A Welch test should take care of a difference in variances. The DF of a Welch test can't be below $min[(n_1 - 1),(n_2 - 1)] = 249,$ so the t statistic must be nearly normal.



If a real-life situation, using a Welch t test, is described here, my guess is that everything is OK. But the exposition of the question is so foggy that my crystal ball doesn't say which answer its author expects.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited May 1 at 16:04

























answered May 1 at 15:37









BruceETBruceET

7,7011721




7,7011721











  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23
















  • $begingroup$
    (+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
    $endgroup$
    – stats.and.r
    May 1 at 16:23















$begingroup$
(+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
$endgroup$
– stats.and.r
May 1 at 16:23




$begingroup$
(+1) for mentionig the capability of the welch test (which I left out) and pointing out that "The question is very poorly constructed". Every option begins with a general statement ("everything is bad/ fine") what I find misleading, too. And I agree that it is not clear what option is correct
$endgroup$
– stats.and.r
May 1 at 16:23













2












$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$








  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09















2












$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$








  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09













2












2








2





$begingroup$

@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)






share|cite|improve this answer











$endgroup$



@1. The importance normality decsreases while N increases. See here or here. "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems. This implies that we can use parametric procedures even when the data are not normally distributed" (from first source).



@2. This may be a problem since if sample sizes are unequal, unequal variances can influence the Type 1 error rate of the t-test by either increasing or decreasing the Type 1. Still, you may want run a Levene's test or to see how much the varianes differ.



@3: To my understanding big sample sizes don't guarantee that everything will be fine. Ok, we sad that with increasing sample size the normality assumption becomes less inportant in 1. I am not too sure about the same being true for unequal variance.



@ 4. I don't understand that sentence. And in fact, I find this sentence confusing, too "for the second above the significance threshold (but the histogram was bell-shaped)". Why does it say "but"? A significant test for normality would indicate data not being normal not the other way. At least this is the case for tests I know like the Shapiro-Wilk test where "The null-hypothesis of this test is that the population is normally distributed. Thus, on the one hand, if the p value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed." (source)



EDIT



Thanks to @Glen_b who pointed out that the quote under "@1" should be restrained because "while the t-test may end up having a nice normal-looking null distribution in many cases if n is large enough, its performance under the null isn't really what people care most about -- it's performance under the alternative -- and there it may not be so great, if you care about rejecting the null in the cases where the effect is not so easy to pick up." (quote from Glen_b)







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited May 2 at 14:07

























answered May 1 at 12:10









stats.and.rstats.and.r

1




1







  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09












  • 1




    $begingroup$
    (+1) for pointing out self-contradictory phrase and for useful links.
    $endgroup$
    – BruceET
    May 1 at 15:38










  • $begingroup$
    re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
    $endgroup$
    – Glen_b
    May 2 at 4:26











  • $begingroup$
    @Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
    $endgroup$
    – stats.and.r
    May 2 at 11:15










  • $begingroup$
    In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
    $endgroup$
    – Glen_b
    May 2 at 13:02











  • $begingroup$
    @Glen_b: Now I understand what you mean with b). Thank you!
    $endgroup$
    – stats.and.r
    May 2 at 13:09







1




1




$begingroup$
(+1) for pointing out self-contradictory phrase and for useful links.
$endgroup$
– BruceET
May 1 at 15:38




$begingroup$
(+1) for pointing out self-contradictory phrase and for useful links.
$endgroup$
– BruceET
May 1 at 15:38












$begingroup$
re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
$endgroup$
– Glen_b
May 2 at 4:26





$begingroup$
re your "@1" ... I don't see that there's clear agreement at the source with the quoted part. Some of the other answers and comments to that question (and clear demonstrations in answers to other questions on site) should cause us to attach very strong doubt the the claim unless very carefully qualified. In particular (a) while the importance of normality to the significance level decreases with increasing sample size, it isn't true for power (as several people there take the trouble to point out), and (b) claims relating to any specific $n$ are easily shown to be false in general
$endgroup$
– Glen_b
May 2 at 4:26













$begingroup$
@Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
$endgroup$
– stats.and.r
May 2 at 11:15




$begingroup$
@Glen_b: Thank you. The quote is from the first link. I changed that. And indeed, I made the mistake to only read the most upvoted answert where it is said that "The t-test is invalid for small samples from non-normal distributions, but it is valid for large samples from non-normal distributions.". But you're right that point 1 of my answer is too general if one considers the power problems. However, I don't understand what you mean with (b). For example, if the claim is that the standard error in a t test decreases when n increases why that would be shown wrong?
$endgroup$
– stats.and.r
May 2 at 11:15












$begingroup$
In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
$endgroup$
– Glen_b
May 2 at 13:02





$begingroup$
In relation to (b), I said "a specific $n$" -- that is, mention of specific samples sizes, as in your quote which says "> 30 or 40". That claim - that if n exceeds 40, violation of the normality assumption will not cause problems - is not true. By contrast, you're just now talking about "what happens as $n$ increases", which is not at all the same thing -- and in particular isn't claiming something is necessarily 'close enough' by some particular $n$. Very different kind of statement to the thing I am responding to.
$endgroup$
– Glen_b
May 2 at 13:02













$begingroup$
@Glen_b: Now I understand what you mean with b). Thank you!
$endgroup$
– stats.and.r
May 2 at 13:09




$begingroup$
@Glen_b: Now I understand what you mean with b). Thank you!
$endgroup$
– stats.and.r
May 2 at 13:09

















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f406028%2fdistribution-normality-check%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company