Why is there a change in the number of degrees of freedom when the following modification is made?How to understand degrees of freedom?Deegrees of freedom of Student's distributionWhat are the specific degrees of freedom of a Chi-squared Goodness of Fit test?why are the degrees of freedom for a chi-square test on a 2x2 contigency table always 1?Why does the mean have $n-1$ degrees of freedom?Estimating the number of degrees of freedom in a chi-squared distributionExplaining degrees of freedom when testing the number of common factorsUnder the Assumptions of the Simple Linear Regression Model, Why Is This Term a Chi-Square Random Variable with $n - 2$ Degrees of Freedom?Degrees of freedom of likelihood ratio test with equal dimension on the null and the parameter space?Why doesn't the distribution involve degrees of freedom even when the test statistic (Z) includes the sample size?Why is degree of freedom so important?

How come mathematicians published in Annals of Eugenics?

Why do unstable nuclei form?

When do you stop "pushing" a book?

Was there a contingency plan in place if Little Boy failed to detonate?

Do Monks gain the 9th level Unarmored Movement benefit when wearing armor or using a shield?

Is it safe to keep the GPU on 100% utilization for a very long time?

Gift for mentor after his thesis defense?

Gain of Non-Inverting Amplifier Does Not vary with Resistor Values

Rusty Chain and back cassette – Replace or Repair?

What's the difference between "ricochet" and "bounce"?

Not taking the bishop with the knight, why?

Compactness in normed vector spaces.

Are double contractions formal? Eg: "couldn't've" for "could not have"

How can I test a shell script in a "safe environment" to avoid harm to my computer?

Renting a house to a graduate student in my department

Does a surprised creature obey the 1st level spell Command?

Best species to breed to intelligence

How to get MAX value using SOQL when there are more than 50,000 rows

Thawing Glaciers return to hand interaction

Integral with DiracDelta. Can Mathematica be made to solve this?

Company stopped paying my salary. What are my options?

Is every story set in the future "science fiction"?

Program for finding longest run of zeros from a list of 100 random integers which are either 0 or 1

Why use steam instead of just hot air?



Why is there a change in the number of degrees of freedom when the following modification is made?


How to understand degrees of freedom?Deegrees of freedom of Student's distributionWhat are the specific degrees of freedom of a Chi-squared Goodness of Fit test?why are the degrees of freedom for a chi-square test on a 2x2 contigency table always 1?Why does the mean have $n-1$ degrees of freedom?Estimating the number of degrees of freedom in a chi-squared distributionExplaining degrees of freedom when testing the number of common factorsUnder the Assumptions of the Simple Linear Regression Model, Why Is This Term a Chi-Square Random Variable with $n - 2$ Degrees of Freedom?Degrees of freedom of likelihood ratio test with equal dimension on the null and the parameter space?Why doesn't the distribution involve degrees of freedom even when the test statistic (Z) includes the sample size?Why is degree of freedom so important?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


In the notes that I'm working through it says the following:
"Let $X_1,...,X_n$ be a random sample from $N(mu,sigma)$



$$sum^n_i=1Bigg[frac(X_i-mu)sigmaBigg]^2$$ has a $chi^2$ distribution with $n$ degrees of freedom.



Now if we modify this by replacing $mu$ with $overlineX$ the distribution changes and we obtain:
$$sum^n_i=1Bigg[frac(X_i-overlineX)sigmaBigg]^2$$ has a $chi^2$ distribution with $n-1$ degrees of freedom."



My question is: why does the number of degrees of freedom change?



My understanding of what a degree of freedom is that the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. So surely as there are $n$ $X_i$ even when we introduce $overlineX$ the number of values that are free to change in the calculation of the statistic is still the same??










share|cite|improve this question









$endgroup$







  • 2




    $begingroup$
    Try it with $n = 2,$ Once you know $bar X$ and $X_1,$ then you know $X_2.$ // The glib, supposedly intuitive, 'explanation' is that you "lose one degree of freedom estimating the mean." // More rigorously $sum_i (X_i - mu)^2$ can be decomposed into $sum_i (X_i - bar X)^2 +$ the square of one other normal random variable. // Some people are 'convinced' by a simulation of each and fitting the relevant CHISQ random variables with $n$ and $n-1$ DF, which I will attempt.
    $endgroup$
    – BruceET
    Apr 30 at 1:45







  • 1




    $begingroup$
    Consider that $barX$ is both closer to the data than $mu$ is, and dependent on it.
    $endgroup$
    – Glen_b
    Apr 30 at 1:58










  • $begingroup$
    The concept of degrees of freedom is thoroughly discussed in our thread at stats.stackexchange.com/questions/16921.
    $endgroup$
    – whuber
    Apr 30 at 12:38

















2












$begingroup$


In the notes that I'm working through it says the following:
"Let $X_1,...,X_n$ be a random sample from $N(mu,sigma)$



$$sum^n_i=1Bigg[frac(X_i-mu)sigmaBigg]^2$$ has a $chi^2$ distribution with $n$ degrees of freedom.



Now if we modify this by replacing $mu$ with $overlineX$ the distribution changes and we obtain:
$$sum^n_i=1Bigg[frac(X_i-overlineX)sigmaBigg]^2$$ has a $chi^2$ distribution with $n-1$ degrees of freedom."



My question is: why does the number of degrees of freedom change?



My understanding of what a degree of freedom is that the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. So surely as there are $n$ $X_i$ even when we introduce $overlineX$ the number of values that are free to change in the calculation of the statistic is still the same??










share|cite|improve this question









$endgroup$







  • 2




    $begingroup$
    Try it with $n = 2,$ Once you know $bar X$ and $X_1,$ then you know $X_2.$ // The glib, supposedly intuitive, 'explanation' is that you "lose one degree of freedom estimating the mean." // More rigorously $sum_i (X_i - mu)^2$ can be decomposed into $sum_i (X_i - bar X)^2 +$ the square of one other normal random variable. // Some people are 'convinced' by a simulation of each and fitting the relevant CHISQ random variables with $n$ and $n-1$ DF, which I will attempt.
    $endgroup$
    – BruceET
    Apr 30 at 1:45







  • 1




    $begingroup$
    Consider that $barX$ is both closer to the data than $mu$ is, and dependent on it.
    $endgroup$
    – Glen_b
    Apr 30 at 1:58










  • $begingroup$
    The concept of degrees of freedom is thoroughly discussed in our thread at stats.stackexchange.com/questions/16921.
    $endgroup$
    – whuber
    Apr 30 at 12:38













2












2








2





$begingroup$


In the notes that I'm working through it says the following:
"Let $X_1,...,X_n$ be a random sample from $N(mu,sigma)$



$$sum^n_i=1Bigg[frac(X_i-mu)sigmaBigg]^2$$ has a $chi^2$ distribution with $n$ degrees of freedom.



Now if we modify this by replacing $mu$ with $overlineX$ the distribution changes and we obtain:
$$sum^n_i=1Bigg[frac(X_i-overlineX)sigmaBigg]^2$$ has a $chi^2$ distribution with $n-1$ degrees of freedom."



My question is: why does the number of degrees of freedom change?



My understanding of what a degree of freedom is that the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. So surely as there are $n$ $X_i$ even when we introduce $overlineX$ the number of values that are free to change in the calculation of the statistic is still the same??










share|cite|improve this question









$endgroup$




In the notes that I'm working through it says the following:
"Let $X_1,...,X_n$ be a random sample from $N(mu,sigma)$



$$sum^n_i=1Bigg[frac(X_i-mu)sigmaBigg]^2$$ has a $chi^2$ distribution with $n$ degrees of freedom.



Now if we modify this by replacing $mu$ with $overlineX$ the distribution changes and we obtain:
$$sum^n_i=1Bigg[frac(X_i-overlineX)sigmaBigg]^2$$ has a $chi^2$ distribution with $n-1$ degrees of freedom."



My question is: why does the number of degrees of freedom change?



My understanding of what a degree of freedom is that the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. So surely as there are $n$ $X_i$ even when we introduce $overlineX$ the number of values that are free to change in the calculation of the statistic is still the same??







distributions normal-distribution chi-squared degrees-of-freedom






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Apr 29 at 22:34









stochasticmrfoxstochasticmrfox

112




112







  • 2




    $begingroup$
    Try it with $n = 2,$ Once you know $bar X$ and $X_1,$ then you know $X_2.$ // The glib, supposedly intuitive, 'explanation' is that you "lose one degree of freedom estimating the mean." // More rigorously $sum_i (X_i - mu)^2$ can be decomposed into $sum_i (X_i - bar X)^2 +$ the square of one other normal random variable. // Some people are 'convinced' by a simulation of each and fitting the relevant CHISQ random variables with $n$ and $n-1$ DF, which I will attempt.
    $endgroup$
    – BruceET
    Apr 30 at 1:45







  • 1




    $begingroup$
    Consider that $barX$ is both closer to the data than $mu$ is, and dependent on it.
    $endgroup$
    – Glen_b
    Apr 30 at 1:58










  • $begingroup$
    The concept of degrees of freedom is thoroughly discussed in our thread at stats.stackexchange.com/questions/16921.
    $endgroup$
    – whuber
    Apr 30 at 12:38












  • 2




    $begingroup$
    Try it with $n = 2,$ Once you know $bar X$ and $X_1,$ then you know $X_2.$ // The glib, supposedly intuitive, 'explanation' is that you "lose one degree of freedom estimating the mean." // More rigorously $sum_i (X_i - mu)^2$ can be decomposed into $sum_i (X_i - bar X)^2 +$ the square of one other normal random variable. // Some people are 'convinced' by a simulation of each and fitting the relevant CHISQ random variables with $n$ and $n-1$ DF, which I will attempt.
    $endgroup$
    – BruceET
    Apr 30 at 1:45







  • 1




    $begingroup$
    Consider that $barX$ is both closer to the data than $mu$ is, and dependent on it.
    $endgroup$
    – Glen_b
    Apr 30 at 1:58










  • $begingroup$
    The concept of degrees of freedom is thoroughly discussed in our thread at stats.stackexchange.com/questions/16921.
    $endgroup$
    – whuber
    Apr 30 at 12:38







2




2




$begingroup$
Try it with $n = 2,$ Once you know $bar X$ and $X_1,$ then you know $X_2.$ // The glib, supposedly intuitive, 'explanation' is that you "lose one degree of freedom estimating the mean." // More rigorously $sum_i (X_i - mu)^2$ can be decomposed into $sum_i (X_i - bar X)^2 +$ the square of one other normal random variable. // Some people are 'convinced' by a simulation of each and fitting the relevant CHISQ random variables with $n$ and $n-1$ DF, which I will attempt.
$endgroup$
– BruceET
Apr 30 at 1:45





$begingroup$
Try it with $n = 2,$ Once you know $bar X$ and $X_1,$ then you know $X_2.$ // The glib, supposedly intuitive, 'explanation' is that you "lose one degree of freedom estimating the mean." // More rigorously $sum_i (X_i - mu)^2$ can be decomposed into $sum_i (X_i - bar X)^2 +$ the square of one other normal random variable. // Some people are 'convinced' by a simulation of each and fitting the relevant CHISQ random variables with $n$ and $n-1$ DF, which I will attempt.
$endgroup$
– BruceET
Apr 30 at 1:45





1




1




$begingroup$
Consider that $barX$ is both closer to the data than $mu$ is, and dependent on it.
$endgroup$
– Glen_b
Apr 30 at 1:58




$begingroup$
Consider that $barX$ is both closer to the data than $mu$ is, and dependent on it.
$endgroup$
– Glen_b
Apr 30 at 1:58












$begingroup$
The concept of degrees of freedom is thoroughly discussed in our thread at stats.stackexchange.com/questions/16921.
$endgroup$
– whuber
Apr 30 at 12:38




$begingroup$
The concept of degrees of freedom is thoroughly discussed in our thread at stats.stackexchange.com/questions/16921.
$endgroup$
– whuber
Apr 30 at 12:38










3 Answers
3






active

oldest

votes


















3












$begingroup$

Suppose we have a random sample from $mathsfNorm(mu, sigma).$



Let the $V_1 = S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is unknown and estimated by $bar X.$ Then $Q_1 = frac(n-1)V_1sigma^2 sim mathsfChisq(n-1).$



Let the $V_2 = frac1nsum_i=1^n (X_i - mu)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is known. Then $Q_2 = fracnV_2sigma^2 sim mathsfChisq(n).$



In the R code below, we choose $m = 10^6$ samples of size $n = 5$ from $mathsfNorm(mu = 50, sigma = 7).$
Then we make histograms of $Q_1$ and $Q_2.$ Chi-squared densities with degrees of freedom $4$ and $5,$ respectively, fit the histograms. The density curve
for $mathsfChisq(5)$ does not fit the histogram
for $Q_1.$



set.seed(2019)
m = 10^6; n = 5; mu = 50; sg = 7
x = rnorm(m*n, mu, sg)
MAT = matrix(x, nrow=m) # each row a sample of n
v1 = apply(MAT, 1, var) # uses sample mean
v2 = rowSums((MAT - mu)^2)/n # uses population mean
q1 = (n-1)*v1 /sg^2
q2 = n*v2 / sg^2

mean(q1); var(q1)
[1] 3.997226 # aprx E(Q1) = 4
[1] 8.00637 # aprx Var(Q1) = 8
mean(q2); var(q2)
[1] 4.997005 # aprx E(Q2) = 5
[1] 9.98925 # aprx Var(Q2) = 10

par(mfrow=c(1,2)) # enables 2 panels per plot
hist(q1, prob=T, br=50, col="skyblue2", ylim=c(0,.2))
curve(dchisq(x, n-1), add=T, lwd=2)
curve(dchisq(x, n), add=T, col="red", lwd=2, lty="dotted")
hist(q2, prob=T, br=50, col="skyblue2")
curve(dchisq(x, n), add=T, lwd=2)
par(mfrow=c(1,1))


enter image description here






share|cite|improve this answer











$endgroup$




















    2












    $begingroup$

    Your understanding of degrees of freedom is correct for one of the two senses of the term. The vector $left(X_1-overline X,ldots,X_n-overline Xright)$ has $n-1$ degrees of freedom because it is subject to the constraint that the sum of the components must be $0,$ so if you know $n-1$ of them plus that constraint, then you know all of them.



    The other sense of the term degrees of freedom is used when one speaks of a chi-square distribution with a specified number of degrees of freedom. Consider any orthonormal basis of the $(n-1)$-dimensional space of $n$-tuples in which the sum of the components is $0.$ Let $U_1,ldots,U_n-1$ be the scalar components of $left( X_1-overline X, ldots, X_n-overline X right)$ with respect to that basis. Then
    beginalign
    & left( X_1-overline Xright)^2 + cdots + left( X_n - overline Xright)^2 = U_1^2 + cdots + U_n-1^2 \[5pt]
    & textand U_1,ldots,U_n-1 sim texti.i.d. N(0, sigma^2).
    endalign






    share|cite|improve this answer









    $endgroup$




















      1












      $begingroup$

      Your understanding of degrees of freedom is correct. The difference essentially boils down to a subtle difference between $mu$ and $barX$.



      The sample mean $barX$ is determined by the values of the observed samples. As a result, there's some redundancy between $barX$ and the individual values of $x_i$. Suppose we knew $barX$ and the first $N-1$ values. That's enough, because we can write out the equation for $barX$ as
      $$barX=frac1Nx_1 +frac1Nx_2 + frac1Nx_3+ ldots + frac1Nx_N-1 + frac1Nx_N$$



      A little bit of rearrangement lets us find that missing value: $$x_N = barX - fracN-1N(x_1+x_2+ldots +x_N-1) $$



      This isn't the case when the population mean ($mu$) is used, since it is not dependent on the observed data; it's known (or assumed to be known) ahead of time. You therefore need to use all $N$ values to produce calculate your test statistic.






      share|cite|improve this answer









      $endgroup$













        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "65"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405756%2fwhy-is-there-a-change-in-the-number-of-degrees-of-freedom-when-the-following-mod%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        3












        $begingroup$

        Suppose we have a random sample from $mathsfNorm(mu, sigma).$



        Let the $V_1 = S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is unknown and estimated by $bar X.$ Then $Q_1 = frac(n-1)V_1sigma^2 sim mathsfChisq(n-1).$



        Let the $V_2 = frac1nsum_i=1^n (X_i - mu)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is known. Then $Q_2 = fracnV_2sigma^2 sim mathsfChisq(n).$



        In the R code below, we choose $m = 10^6$ samples of size $n = 5$ from $mathsfNorm(mu = 50, sigma = 7).$
        Then we make histograms of $Q_1$ and $Q_2.$ Chi-squared densities with degrees of freedom $4$ and $5,$ respectively, fit the histograms. The density curve
        for $mathsfChisq(5)$ does not fit the histogram
        for $Q_1.$



        set.seed(2019)
        m = 10^6; n = 5; mu = 50; sg = 7
        x = rnorm(m*n, mu, sg)
        MAT = matrix(x, nrow=m) # each row a sample of n
        v1 = apply(MAT, 1, var) # uses sample mean
        v2 = rowSums((MAT - mu)^2)/n # uses population mean
        q1 = (n-1)*v1 /sg^2
        q2 = n*v2 / sg^2

        mean(q1); var(q1)
        [1] 3.997226 # aprx E(Q1) = 4
        [1] 8.00637 # aprx Var(Q1) = 8
        mean(q2); var(q2)
        [1] 4.997005 # aprx E(Q2) = 5
        [1] 9.98925 # aprx Var(Q2) = 10

        par(mfrow=c(1,2)) # enables 2 panels per plot
        hist(q1, prob=T, br=50, col="skyblue2", ylim=c(0,.2))
        curve(dchisq(x, n-1), add=T, lwd=2)
        curve(dchisq(x, n), add=T, col="red", lwd=2, lty="dotted")
        hist(q2, prob=T, br=50, col="skyblue2")
        curve(dchisq(x, n), add=T, lwd=2)
        par(mfrow=c(1,1))


        enter image description here






        share|cite|improve this answer











        $endgroup$

















          3












          $begingroup$

          Suppose we have a random sample from $mathsfNorm(mu, sigma).$



          Let the $V_1 = S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is unknown and estimated by $bar X.$ Then $Q_1 = frac(n-1)V_1sigma^2 sim mathsfChisq(n-1).$



          Let the $V_2 = frac1nsum_i=1^n (X_i - mu)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is known. Then $Q_2 = fracnV_2sigma^2 sim mathsfChisq(n).$



          In the R code below, we choose $m = 10^6$ samples of size $n = 5$ from $mathsfNorm(mu = 50, sigma = 7).$
          Then we make histograms of $Q_1$ and $Q_2.$ Chi-squared densities with degrees of freedom $4$ and $5,$ respectively, fit the histograms. The density curve
          for $mathsfChisq(5)$ does not fit the histogram
          for $Q_1.$



          set.seed(2019)
          m = 10^6; n = 5; mu = 50; sg = 7
          x = rnorm(m*n, mu, sg)
          MAT = matrix(x, nrow=m) # each row a sample of n
          v1 = apply(MAT, 1, var) # uses sample mean
          v2 = rowSums((MAT - mu)^2)/n # uses population mean
          q1 = (n-1)*v1 /sg^2
          q2 = n*v2 / sg^2

          mean(q1); var(q1)
          [1] 3.997226 # aprx E(Q1) = 4
          [1] 8.00637 # aprx Var(Q1) = 8
          mean(q2); var(q2)
          [1] 4.997005 # aprx E(Q2) = 5
          [1] 9.98925 # aprx Var(Q2) = 10

          par(mfrow=c(1,2)) # enables 2 panels per plot
          hist(q1, prob=T, br=50, col="skyblue2", ylim=c(0,.2))
          curve(dchisq(x, n-1), add=T, lwd=2)
          curve(dchisq(x, n), add=T, col="red", lwd=2, lty="dotted")
          hist(q2, prob=T, br=50, col="skyblue2")
          curve(dchisq(x, n), add=T, lwd=2)
          par(mfrow=c(1,1))


          enter image description here






          share|cite|improve this answer











          $endgroup$















            3












            3








            3





            $begingroup$

            Suppose we have a random sample from $mathsfNorm(mu, sigma).$



            Let the $V_1 = S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is unknown and estimated by $bar X.$ Then $Q_1 = frac(n-1)V_1sigma^2 sim mathsfChisq(n-1).$



            Let the $V_2 = frac1nsum_i=1^n (X_i - mu)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is known. Then $Q_2 = fracnV_2sigma^2 sim mathsfChisq(n).$



            In the R code below, we choose $m = 10^6$ samples of size $n = 5$ from $mathsfNorm(mu = 50, sigma = 7).$
            Then we make histograms of $Q_1$ and $Q_2.$ Chi-squared densities with degrees of freedom $4$ and $5,$ respectively, fit the histograms. The density curve
            for $mathsfChisq(5)$ does not fit the histogram
            for $Q_1.$



            set.seed(2019)
            m = 10^6; n = 5; mu = 50; sg = 7
            x = rnorm(m*n, mu, sg)
            MAT = matrix(x, nrow=m) # each row a sample of n
            v1 = apply(MAT, 1, var) # uses sample mean
            v2 = rowSums((MAT - mu)^2)/n # uses population mean
            q1 = (n-1)*v1 /sg^2
            q2 = n*v2 / sg^2

            mean(q1); var(q1)
            [1] 3.997226 # aprx E(Q1) = 4
            [1] 8.00637 # aprx Var(Q1) = 8
            mean(q2); var(q2)
            [1] 4.997005 # aprx E(Q2) = 5
            [1] 9.98925 # aprx Var(Q2) = 10

            par(mfrow=c(1,2)) # enables 2 panels per plot
            hist(q1, prob=T, br=50, col="skyblue2", ylim=c(0,.2))
            curve(dchisq(x, n-1), add=T, lwd=2)
            curve(dchisq(x, n), add=T, col="red", lwd=2, lty="dotted")
            hist(q2, prob=T, br=50, col="skyblue2")
            curve(dchisq(x, n), add=T, lwd=2)
            par(mfrow=c(1,1))


            enter image description here






            share|cite|improve this answer











            $endgroup$



            Suppose we have a random sample from $mathsfNorm(mu, sigma).$



            Let the $V_1 = S^2 = frac1n-1sum_i=1^n (X_i - bar X)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is unknown and estimated by $bar X.$ Then $Q_1 = frac(n-1)V_1sigma^2 sim mathsfChisq(n-1).$



            Let the $V_2 = frac1nsum_i=1^n (X_i - mu)^2$ be the estimate of the population variance $sigma^2$ when $mu$ is known. Then $Q_2 = fracnV_2sigma^2 sim mathsfChisq(n).$



            In the R code below, we choose $m = 10^6$ samples of size $n = 5$ from $mathsfNorm(mu = 50, sigma = 7).$
            Then we make histograms of $Q_1$ and $Q_2.$ Chi-squared densities with degrees of freedom $4$ and $5,$ respectively, fit the histograms. The density curve
            for $mathsfChisq(5)$ does not fit the histogram
            for $Q_1.$



            set.seed(2019)
            m = 10^6; n = 5; mu = 50; sg = 7
            x = rnorm(m*n, mu, sg)
            MAT = matrix(x, nrow=m) # each row a sample of n
            v1 = apply(MAT, 1, var) # uses sample mean
            v2 = rowSums((MAT - mu)^2)/n # uses population mean
            q1 = (n-1)*v1 /sg^2
            q2 = n*v2 / sg^2

            mean(q1); var(q1)
            [1] 3.997226 # aprx E(Q1) = 4
            [1] 8.00637 # aprx Var(Q1) = 8
            mean(q2); var(q2)
            [1] 4.997005 # aprx E(Q2) = 5
            [1] 9.98925 # aprx Var(Q2) = 10

            par(mfrow=c(1,2)) # enables 2 panels per plot
            hist(q1, prob=T, br=50, col="skyblue2", ylim=c(0,.2))
            curve(dchisq(x, n-1), add=T, lwd=2)
            curve(dchisq(x, n), add=T, col="red", lwd=2, lty="dotted")
            hist(q2, prob=T, br=50, col="skyblue2")
            curve(dchisq(x, n), add=T, lwd=2)
            par(mfrow=c(1,1))


            enter image description here







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Apr 30 at 2:51

























            answered Apr 30 at 2:32









            BruceETBruceET

            7,6811721




            7,6811721























                2












                $begingroup$

                Your understanding of degrees of freedom is correct for one of the two senses of the term. The vector $left(X_1-overline X,ldots,X_n-overline Xright)$ has $n-1$ degrees of freedom because it is subject to the constraint that the sum of the components must be $0,$ so if you know $n-1$ of them plus that constraint, then you know all of them.



                The other sense of the term degrees of freedom is used when one speaks of a chi-square distribution with a specified number of degrees of freedom. Consider any orthonormal basis of the $(n-1)$-dimensional space of $n$-tuples in which the sum of the components is $0.$ Let $U_1,ldots,U_n-1$ be the scalar components of $left( X_1-overline X, ldots, X_n-overline X right)$ with respect to that basis. Then
                beginalign
                & left( X_1-overline Xright)^2 + cdots + left( X_n - overline Xright)^2 = U_1^2 + cdots + U_n-1^2 \[5pt]
                & textand U_1,ldots,U_n-1 sim texti.i.d. N(0, sigma^2).
                endalign






                share|cite|improve this answer









                $endgroup$

















                  2












                  $begingroup$

                  Your understanding of degrees of freedom is correct for one of the two senses of the term. The vector $left(X_1-overline X,ldots,X_n-overline Xright)$ has $n-1$ degrees of freedom because it is subject to the constraint that the sum of the components must be $0,$ so if you know $n-1$ of them plus that constraint, then you know all of them.



                  The other sense of the term degrees of freedom is used when one speaks of a chi-square distribution with a specified number of degrees of freedom. Consider any orthonormal basis of the $(n-1)$-dimensional space of $n$-tuples in which the sum of the components is $0.$ Let $U_1,ldots,U_n-1$ be the scalar components of $left( X_1-overline X, ldots, X_n-overline X right)$ with respect to that basis. Then
                  beginalign
                  & left( X_1-overline Xright)^2 + cdots + left( X_n - overline Xright)^2 = U_1^2 + cdots + U_n-1^2 \[5pt]
                  & textand U_1,ldots,U_n-1 sim texti.i.d. N(0, sigma^2).
                  endalign






                  share|cite|improve this answer









                  $endgroup$















                    2












                    2








                    2





                    $begingroup$

                    Your understanding of degrees of freedom is correct for one of the two senses of the term. The vector $left(X_1-overline X,ldots,X_n-overline Xright)$ has $n-1$ degrees of freedom because it is subject to the constraint that the sum of the components must be $0,$ so if you know $n-1$ of them plus that constraint, then you know all of them.



                    The other sense of the term degrees of freedom is used when one speaks of a chi-square distribution with a specified number of degrees of freedom. Consider any orthonormal basis of the $(n-1)$-dimensional space of $n$-tuples in which the sum of the components is $0.$ Let $U_1,ldots,U_n-1$ be the scalar components of $left( X_1-overline X, ldots, X_n-overline X right)$ with respect to that basis. Then
                    beginalign
                    & left( X_1-overline Xright)^2 + cdots + left( X_n - overline Xright)^2 = U_1^2 + cdots + U_n-1^2 \[5pt]
                    & textand U_1,ldots,U_n-1 sim texti.i.d. N(0, sigma^2).
                    endalign






                    share|cite|improve this answer









                    $endgroup$



                    Your understanding of degrees of freedom is correct for one of the two senses of the term. The vector $left(X_1-overline X,ldots,X_n-overline Xright)$ has $n-1$ degrees of freedom because it is subject to the constraint that the sum of the components must be $0,$ so if you know $n-1$ of them plus that constraint, then you know all of them.



                    The other sense of the term degrees of freedom is used when one speaks of a chi-square distribution with a specified number of degrees of freedom. Consider any orthonormal basis of the $(n-1)$-dimensional space of $n$-tuples in which the sum of the components is $0.$ Let $U_1,ldots,U_n-1$ be the scalar components of $left( X_1-overline X, ldots, X_n-overline X right)$ with respect to that basis. Then
                    beginalign
                    & left( X_1-overline Xright)^2 + cdots + left( X_n - overline Xright)^2 = U_1^2 + cdots + U_n-1^2 \[5pt]
                    & textand U_1,ldots,U_n-1 sim texti.i.d. N(0, sigma^2).
                    endalign







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Apr 30 at 5:21









                    Michael HardyMichael Hardy

                    4,2351430




                    4,2351430





















                        1












                        $begingroup$

                        Your understanding of degrees of freedom is correct. The difference essentially boils down to a subtle difference between $mu$ and $barX$.



                        The sample mean $barX$ is determined by the values of the observed samples. As a result, there's some redundancy between $barX$ and the individual values of $x_i$. Suppose we knew $barX$ and the first $N-1$ values. That's enough, because we can write out the equation for $barX$ as
                        $$barX=frac1Nx_1 +frac1Nx_2 + frac1Nx_3+ ldots + frac1Nx_N-1 + frac1Nx_N$$



                        A little bit of rearrangement lets us find that missing value: $$x_N = barX - fracN-1N(x_1+x_2+ldots +x_N-1) $$



                        This isn't the case when the population mean ($mu$) is used, since it is not dependent on the observed data; it's known (or assumed to be known) ahead of time. You therefore need to use all $N$ values to produce calculate your test statistic.






                        share|cite|improve this answer









                        $endgroup$

















                          1












                          $begingroup$

                          Your understanding of degrees of freedom is correct. The difference essentially boils down to a subtle difference between $mu$ and $barX$.



                          The sample mean $barX$ is determined by the values of the observed samples. As a result, there's some redundancy between $barX$ and the individual values of $x_i$. Suppose we knew $barX$ and the first $N-1$ values. That's enough, because we can write out the equation for $barX$ as
                          $$barX=frac1Nx_1 +frac1Nx_2 + frac1Nx_3+ ldots + frac1Nx_N-1 + frac1Nx_N$$



                          A little bit of rearrangement lets us find that missing value: $$x_N = barX - fracN-1N(x_1+x_2+ldots +x_N-1) $$



                          This isn't the case when the population mean ($mu$) is used, since it is not dependent on the observed data; it's known (or assumed to be known) ahead of time. You therefore need to use all $N$ values to produce calculate your test statistic.






                          share|cite|improve this answer









                          $endgroup$















                            1












                            1








                            1





                            $begingroup$

                            Your understanding of degrees of freedom is correct. The difference essentially boils down to a subtle difference between $mu$ and $barX$.



                            The sample mean $barX$ is determined by the values of the observed samples. As a result, there's some redundancy between $barX$ and the individual values of $x_i$. Suppose we knew $barX$ and the first $N-1$ values. That's enough, because we can write out the equation for $barX$ as
                            $$barX=frac1Nx_1 +frac1Nx_2 + frac1Nx_3+ ldots + frac1Nx_N-1 + frac1Nx_N$$



                            A little bit of rearrangement lets us find that missing value: $$x_N = barX - fracN-1N(x_1+x_2+ldots +x_N-1) $$



                            This isn't the case when the population mean ($mu$) is used, since it is not dependent on the observed data; it's known (or assumed to be known) ahead of time. You therefore need to use all $N$ values to produce calculate your test statistic.






                            share|cite|improve this answer









                            $endgroup$



                            Your understanding of degrees of freedom is correct. The difference essentially boils down to a subtle difference between $mu$ and $barX$.



                            The sample mean $barX$ is determined by the values of the observed samples. As a result, there's some redundancy between $barX$ and the individual values of $x_i$. Suppose we knew $barX$ and the first $N-1$ values. That's enough, because we can write out the equation for $barX$ as
                            $$barX=frac1Nx_1 +frac1Nx_2 + frac1Nx_3+ ldots + frac1Nx_N-1 + frac1Nx_N$$



                            A little bit of rearrangement lets us find that missing value: $$x_N = barX - fracN-1N(x_1+x_2+ldots +x_N-1) $$



                            This isn't the case when the population mean ($mu$) is used, since it is not dependent on the observed data; it's known (or assumed to be known) ahead of time. You therefore need to use all $N$ values to produce calculate your test statistic.







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Apr 30 at 3:26









                            Matt KrauseMatt Krause

                            15.2k24581




                            15.2k24581



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Cross Validated!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405756%2fwhy-is-there-a-change-in-the-number-of-degrees-of-freedom-when-the-following-mod%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                                Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                                What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company