Why does logistic function use e rather than 2?Why is Reconstruction in Autoencoders Using the Same Activation Function as Forward Activation, and not the Inverse?Sigmoid's stabilityWhy ReLU is better than the other activation functionsMachine Learning: Why do the error in cost function need to be squared?Why does Q-learning use an actor model and critic model?Why do we need the sigmoid function in logistic regression?Preprocessing and dropout in Autoencoders?Properly using activation functions of neural networkPurpose of backpropagation in neural networksPossible reasons for word2vec learning context words as most similar rather than words in similar contexts

Fedora boot screen shows both Fedora logo and Lenovo logo. Why and How?

How much will studying magic in an academy cost?

Applicability of Lagrange Multipliers in the analysis of large-scale MILPs?

Cascading Repair Costs following Blown Head Gasket on a 2004 Subaru Outback

How does metta sutra develop loving kindness

In the Marvel universe, can a human have a baby with any non-human?

Why aren't cotton tents more popular?

Folding basket - is there such a thing?

Does Marvel have an equivalent of the Green Lantern?

Swapping rooks in a 4x4 board

Require advice on power conservation for backpacking trip

What reason would an alien civilization have for building a Dyson Sphere (or Swarm) if cheap Nuclear fusion is available?

Would it be a copyright violation if I made a character’s full name refer to a song?

Interaction between Leyline of Anticipation and Teferi, Time Raveler

Did Karl Marx ever use any example that involved cotton and dollars to illustrate the way capital and surplus value were generated?

First-year PhD giving a talk among well-established researchers in the field

Is my Rep in Stack-Exchange Form?

Do I have any obligations to my PhD supervisor's requests after I have graduated?

Why did pressing the joystick button spit out keypresses?

Can humans ever directly see a few photons at a time? Can a human see a single photon?

Long term BTC investing

Accidentals and ties

Should my manager be aware of private LinkedIn approaches I receive? How to politely have this happen?

How do I turn off a repeating trade?



Why does logistic function use e rather than 2?


Why is Reconstruction in Autoencoders Using the Same Activation Function as Forward Activation, and not the Inverse?Sigmoid's stabilityWhy ReLU is better than the other activation functionsMachine Learning: Why do the error in cost function need to be squared?Why does Q-learning use an actor model and critic model?Why do we need the sigmoid function in logistic regression?Preprocessing and dropout in Autoencoders?Properly using activation functions of neural networkPurpose of backpropagation in neural networksPossible reasons for word2vec learning context words as most similar rather than words in similar contexts






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








7












$begingroup$


sigmoid function could be used as activation function in machine learning.



$$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$



If substitute e with 2,



def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y = sigmoid2(x)
plt.scatter(x,y)


the plot looks similar.



enter image description here



Why does the logistic function use $e$ rather than 2?










share|improve this question











$endgroup$


















    7












    $begingroup$


    sigmoid function could be used as activation function in machine learning.



    $$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$



    If substitute e with 2,



    def sigmoid2(z):
    return 1/(1+2**(-z))
    x = np.arange(-9,9,dtype=float)
    y = sigmoid2(x)
    plt.scatter(x,y)


    the plot looks similar.



    enter image description here



    Why does the logistic function use $e$ rather than 2?










    share|improve this question











    $endgroup$














      7












      7








      7


      1



      $begingroup$


      sigmoid function could be used as activation function in machine learning.



      $$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$



      If substitute e with 2,



      def sigmoid2(z):
      return 1/(1+2**(-z))
      x = np.arange(-9,9,dtype=float)
      y = sigmoid2(x)
      plt.scatter(x,y)


      the plot looks similar.



      enter image description here



      Why does the logistic function use $e$ rather than 2?










      share|improve this question











      $endgroup$




      sigmoid function could be used as activation function in machine learning.



      $$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$



      If substitute e with 2,



      def sigmoid2(z):
      return 1/(1+2**(-z))
      x = np.arange(-9,9,dtype=float)
      y = sigmoid2(x)
      plt.scatter(x,y)


      the plot looks similar.



      enter image description here



      Why does the logistic function use $e$ rather than 2?







      machine-learning deep-learning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jun 6 at 14:48









      Ethan

      1,0088 silver badges29 bronze badges




      1,0088 silver badges29 bronze badges










      asked Jun 6 at 7:55









      baojieqhbaojieqh

      805 bronze badges




      805 bronze badges




















          3 Answers
          3






          active

          oldest

          votes


















          11












          $begingroup$

          Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.

          Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
          Which are:



          • $limlimits_x rightarrow inftyf(x)=1$

          • $limlimits_x rightarrow -inftyf(x)=0$


          • $f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$

          Here is an example of suitable functions from wikipedia.






          share|improve this answer









          $endgroup$








          • 8




            $begingroup$
            I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
            $endgroup$
            – Calvin Godfrey
            Jun 6 at 16:55










          • $begingroup$
            Same goes for $2^x$ when using $log_2$.
            $endgroup$
            – Andreas Look
            Jun 6 at 20:44











          • $begingroup$
            @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
            $endgroup$
            – sfmiller940
            Jun 12 at 20:08











          • $begingroup$
            No, check out binary logarithm. $log_2 (2^x)=x$.
            $endgroup$
            – Andreas Look
            Jun 13 at 19:32



















          4












          $begingroup$

          So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.






          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            thanks for your answer. what does "canonical link function of a glm" mean?
            $endgroup$
            – baojieqh
            Jun 6 at 9:22










          • $begingroup$
            @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
            $endgroup$
            – aranglol
            Jun 6 at 23:55










          • $begingroup$
            Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
            $endgroup$
            – aranglol
            Jun 6 at 23:58











          • $begingroup$
            @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
            $endgroup$
            – baojieqh
            Jun 7 at 0:37










          • $begingroup$
            This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
            $endgroup$
            – David Richerby
            Jun 7 at 9:21


















          0












          $begingroup$

          It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.



          You can read more details at my blog.






          share|improve this answer










          New contributor



          Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          $endgroup$















            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53308%2fwhy-does-logistic-function-use-e-rather-than-2%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            11












            $begingroup$

            Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.

            Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
            Which are:



            • $limlimits_x rightarrow inftyf(x)=1$

            • $limlimits_x rightarrow -inftyf(x)=0$


            • $f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$

            Here is an example of suitable functions from wikipedia.






            share|improve this answer









            $endgroup$








            • 8




              $begingroup$
              I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
              $endgroup$
              – Calvin Godfrey
              Jun 6 at 16:55










            • $begingroup$
              Same goes for $2^x$ when using $log_2$.
              $endgroup$
              – Andreas Look
              Jun 6 at 20:44











            • $begingroup$
              @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
              $endgroup$
              – sfmiller940
              Jun 12 at 20:08











            • $begingroup$
              No, check out binary logarithm. $log_2 (2^x)=x$.
              $endgroup$
              – Andreas Look
              Jun 13 at 19:32
















            11












            $begingroup$

            Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.

            Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
            Which are:



            • $limlimits_x rightarrow inftyf(x)=1$

            • $limlimits_x rightarrow -inftyf(x)=0$


            • $f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$

            Here is an example of suitable functions from wikipedia.






            share|improve this answer









            $endgroup$








            • 8




              $begingroup$
              I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
              $endgroup$
              – Calvin Godfrey
              Jun 6 at 16:55










            • $begingroup$
              Same goes for $2^x$ when using $log_2$.
              $endgroup$
              – Andreas Look
              Jun 6 at 20:44











            • $begingroup$
              @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
              $endgroup$
              – sfmiller940
              Jun 12 at 20:08











            • $begingroup$
              No, check out binary logarithm. $log_2 (2^x)=x$.
              $endgroup$
              – Andreas Look
              Jun 13 at 19:32














            11












            11








            11





            $begingroup$

            Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.

            Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
            Which are:



            • $limlimits_x rightarrow inftyf(x)=1$

            • $limlimits_x rightarrow -inftyf(x)=0$


            • $f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$

            Here is an example of suitable functions from wikipedia.






            share|improve this answer









            $endgroup$



            Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.

            Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
            Which are:



            • $limlimits_x rightarrow inftyf(x)=1$

            • $limlimits_x rightarrow -inftyf(x)=0$


            • $f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$

            Here is an example of suitable functions from wikipedia.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jun 6 at 8:18









            Andreas LookAndreas Look

            6431 silver badge12 bronze badges




            6431 silver badge12 bronze badges







            • 8




              $begingroup$
              I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
              $endgroup$
              – Calvin Godfrey
              Jun 6 at 16:55










            • $begingroup$
              Same goes for $2^x$ when using $log_2$.
              $endgroup$
              – Andreas Look
              Jun 6 at 20:44











            • $begingroup$
              @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
              $endgroup$
              – sfmiller940
              Jun 12 at 20:08











            • $begingroup$
              No, check out binary logarithm. $log_2 (2^x)=x$.
              $endgroup$
              – Andreas Look
              Jun 13 at 19:32













            • 8




              $begingroup$
              I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
              $endgroup$
              – Calvin Godfrey
              Jun 6 at 16:55










            • $begingroup$
              Same goes for $2^x$ when using $log_2$.
              $endgroup$
              – Andreas Look
              Jun 6 at 20:44











            • $begingroup$
              @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
              $endgroup$
              – sfmiller940
              Jun 12 at 20:08











            • $begingroup$
              No, check out binary logarithm. $log_2 (2^x)=x$.
              $endgroup$
              – Andreas Look
              Jun 13 at 19:32








            8




            8




            $begingroup$
            I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
            $endgroup$
            – Calvin Godfrey
            Jun 6 at 16:55




            $begingroup$
            I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
            $endgroup$
            – Calvin Godfrey
            Jun 6 at 16:55












            $begingroup$
            Same goes for $2^x$ when using $log_2$.
            $endgroup$
            – Andreas Look
            Jun 6 at 20:44





            $begingroup$
            Same goes for $2^x$ when using $log_2$.
            $endgroup$
            – Andreas Look
            Jun 6 at 20:44













            $begingroup$
            @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
            $endgroup$
            – sfmiller940
            Jun 12 at 20:08





            $begingroup$
            @AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
            $endgroup$
            – sfmiller940
            Jun 12 at 20:08













            $begingroup$
            No, check out binary logarithm. $log_2 (2^x)=x$.
            $endgroup$
            – Andreas Look
            Jun 13 at 19:32





            $begingroup$
            No, check out binary logarithm. $log_2 (2^x)=x$.
            $endgroup$
            – Andreas Look
            Jun 13 at 19:32














            4












            $begingroup$

            So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.






            share|improve this answer











            $endgroup$








            • 2




              $begingroup$
              thanks for your answer. what does "canonical link function of a glm" mean?
              $endgroup$
              – baojieqh
              Jun 6 at 9:22










            • $begingroup$
              @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
              $endgroup$
              – aranglol
              Jun 6 at 23:55










            • $begingroup$
              Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
              $endgroup$
              – aranglol
              Jun 6 at 23:58











            • $begingroup$
              @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
              $endgroup$
              – baojieqh
              Jun 7 at 0:37










            • $begingroup$
              This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
              $endgroup$
              – David Richerby
              Jun 7 at 9:21















            4












            $begingroup$

            So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.






            share|improve this answer











            $endgroup$








            • 2




              $begingroup$
              thanks for your answer. what does "canonical link function of a glm" mean?
              $endgroup$
              – baojieqh
              Jun 6 at 9:22










            • $begingroup$
              @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
              $endgroup$
              – aranglol
              Jun 6 at 23:55










            • $begingroup$
              Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
              $endgroup$
              – aranglol
              Jun 6 at 23:58











            • $begingroup$
              @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
              $endgroup$
              – baojieqh
              Jun 7 at 0:37










            • $begingroup$
              This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
              $endgroup$
              – David Richerby
              Jun 7 at 9:21













            4












            4








            4





            $begingroup$

            So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.






            share|improve this answer











            $endgroup$



            So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jun 6 at 15:00









            Ethan

            1,0088 silver badges29 bronze badges




            1,0088 silver badges29 bronze badges










            answered Jun 6 at 8:12









            Anonymous EmuAnonymous Emu

            1504 bronze badges




            1504 bronze badges







            • 2




              $begingroup$
              thanks for your answer. what does "canonical link function of a glm" mean?
              $endgroup$
              – baojieqh
              Jun 6 at 9:22










            • $begingroup$
              @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
              $endgroup$
              – aranglol
              Jun 6 at 23:55










            • $begingroup$
              Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
              $endgroup$
              – aranglol
              Jun 6 at 23:58











            • $begingroup$
              @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
              $endgroup$
              – baojieqh
              Jun 7 at 0:37










            • $begingroup$
              This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
              $endgroup$
              – David Richerby
              Jun 7 at 9:21












            • 2




              $begingroup$
              thanks for your answer. what does "canonical link function of a glm" mean?
              $endgroup$
              – baojieqh
              Jun 6 at 9:22










            • $begingroup$
              @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
              $endgroup$
              – aranglol
              Jun 6 at 23:55










            • $begingroup$
              Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
              $endgroup$
              – aranglol
              Jun 6 at 23:58











            • $begingroup$
              @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
              $endgroup$
              – baojieqh
              Jun 7 at 0:37










            • $begingroup$
              This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
              $endgroup$
              – David Richerby
              Jun 7 at 9:21







            2




            2




            $begingroup$
            thanks for your answer. what does "canonical link function of a glm" mean?
            $endgroup$
            – baojieqh
            Jun 6 at 9:22




            $begingroup$
            thanks for your answer. what does "canonical link function of a glm" mean?
            $endgroup$
            – baojieqh
            Jun 6 at 9:22












            $begingroup$
            @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
            $endgroup$
            – aranglol
            Jun 6 at 23:55




            $begingroup$
            @baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
            $endgroup$
            – aranglol
            Jun 6 at 23:55












            $begingroup$
            Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
            $endgroup$
            – aranglol
            Jun 6 at 23:58





            $begingroup$
            Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
            $endgroup$
            – aranglol
            Jun 6 at 23:58













            $begingroup$
            @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
            $endgroup$
            – baojieqh
            Jun 7 at 0:37




            $begingroup$
            @aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
            $endgroup$
            – baojieqh
            Jun 7 at 0:37












            $begingroup$
            This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
            $endgroup$
            – David Richerby
            Jun 7 at 9:21




            $begingroup$
            This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
            $endgroup$
            – David Richerby
            Jun 7 at 9:21











            0












            $begingroup$

            It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.



            You can read more details at my blog.






            share|improve this answer










            New contributor



            Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.





            $endgroup$

















              0












              $begingroup$

              It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.



              You can read more details at my blog.






              share|improve this answer










              New contributor



              Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





              $endgroup$















                0












                0








                0





                $begingroup$

                It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.



                You can read more details at my blog.






                share|improve this answer










                New contributor



                Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                $endgroup$



                It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.



                You can read more details at my blog.







                share|improve this answer










                New contributor



                Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.








                share|improve this answer



                share|improve this answer








                edited Jun 16 at 13:03









                Stephen Rauch

                1,5436 gold badges13 silver badges30 bronze badges




                1,5436 gold badges13 silver badges30 bronze badges






                New contributor



                Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.








                answered Jun 16 at 11:50









                Yossi LevyYossi Levy

                12 bronze badges




                12 bronze badges




                New contributor



                Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.




                New contributor




                Yossi Levy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53308%2fwhy-does-logistic-function-use-e-rather-than-2%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                    Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

                    Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020