Why does logistic function use e rather than 2?Why is Reconstruction in Autoencoders Using the Same Activation Function as Forward Activation, and not the Inverse?Sigmoid's stabilityWhy ReLU is better than the other activation functionsMachine Learning: Why do the error in cost function need to be squared?Why does Q-learning use an actor model and critic model?Why do we need the sigmoid function in logistic regression?Preprocessing and dropout in Autoencoders?Properly using activation functions of neural networkPurpose of backpropagation in neural networksPossible reasons for word2vec learning context words as most similar rather than words in similar contexts
Fedora boot screen shows both Fedora logo and Lenovo logo. Why and How?
How much will studying magic in an academy cost?
Applicability of Lagrange Multipliers in the analysis of large-scale MILPs?
Cascading Repair Costs following Blown Head Gasket on a 2004 Subaru Outback
How does metta sutra develop loving kindness
In the Marvel universe, can a human have a baby with any non-human?
Why aren't cotton tents more popular?
Folding basket - is there such a thing?
Does Marvel have an equivalent of the Green Lantern?
Swapping rooks in a 4x4 board
Require advice on power conservation for backpacking trip
What reason would an alien civilization have for building a Dyson Sphere (or Swarm) if cheap Nuclear fusion is available?
Would it be a copyright violation if I made a character’s full name refer to a song?
Interaction between Leyline of Anticipation and Teferi, Time Raveler
Did Karl Marx ever use any example that involved cotton and dollars to illustrate the way capital and surplus value were generated?
First-year PhD giving a talk among well-established researchers in the field
Is my Rep in Stack-Exchange Form?
Do I have any obligations to my PhD supervisor's requests after I have graduated?
Why did pressing the joystick button spit out keypresses?
Can humans ever directly see a few photons at a time? Can a human see a single photon?
Long term BTC investing
Accidentals and ties
Should my manager be aware of private LinkedIn approaches I receive? How to politely have this happen?
How do I turn off a repeating trade?
Why does logistic function use e rather than 2?
Why is Reconstruction in Autoencoders Using the Same Activation Function as Forward Activation, and not the Inverse?Sigmoid's stabilityWhy ReLU is better than the other activation functionsMachine Learning: Why do the error in cost function need to be squared?Why does Q-learning use an actor model and critic model?Why do we need the sigmoid function in logistic regression?Preprocessing and dropout in Autoencoders?Properly using activation functions of neural networkPurpose of backpropagation in neural networksPossible reasons for word2vec learning context words as most similar rather than words in similar contexts
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
sigmoid function could be used as activation function in machine learning.
$$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$
If substitute e with 2,
def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y = sigmoid2(x)
plt.scatter(x,y)
the plot looks similar.
Why does the logistic function use $e$ rather than 2?
machine-learning deep-learning
$endgroup$
add a comment |
$begingroup$
sigmoid function could be used as activation function in machine learning.
$$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$
If substitute e with 2,
def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y = sigmoid2(x)
plt.scatter(x,y)
the plot looks similar.
Why does the logistic function use $e$ rather than 2?
machine-learning deep-learning
$endgroup$
add a comment |
$begingroup$
sigmoid function could be used as activation function in machine learning.
$$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$
If substitute e with 2,
def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y = sigmoid2(x)
plt.scatter(x,y)
the plot looks similar.
Why does the logistic function use $e$ rather than 2?
machine-learning deep-learning
$endgroup$
sigmoid function could be used as activation function in machine learning.
$$displaystyle S(x)=frac 11+e^-x=frac e^xe^x+1.$$
If substitute e with 2,
def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y = sigmoid2(x)
plt.scatter(x,y)
the plot looks similar.
Why does the logistic function use $e$ rather than 2?
machine-learning deep-learning
machine-learning deep-learning
edited Jun 6 at 14:48
Ethan
1,0088 silver badges29 bronze badges
1,0088 silver badges29 bronze badges
asked Jun 6 at 7:55
baojieqhbaojieqh
805 bronze badges
805 bronze badges
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.
Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
Which are:
- $limlimits_x rightarrow inftyf(x)=1$
- $limlimits_x rightarrow -inftyf(x)=0$
$f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$
Here is an example of suitable functions from wikipedia.
$endgroup$
8
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
add a comment |
$begingroup$
So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.
$endgroup$
2
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
add a comment |
$begingroup$
It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.
You can read more details at my blog.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53308%2fwhy-does-logistic-function-use-e-rather-than-2%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.
Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
Which are:
- $limlimits_x rightarrow inftyf(x)=1$
- $limlimits_x rightarrow -inftyf(x)=0$
$f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$
Here is an example of suitable functions from wikipedia.
$endgroup$
8
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
add a comment |
$begingroup$
Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.
Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
Which are:
- $limlimits_x rightarrow inftyf(x)=1$
- $limlimits_x rightarrow -inftyf(x)=0$
$f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$
Here is an example of suitable functions from wikipedia.
$endgroup$
8
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
add a comment |
$begingroup$
Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.
Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
Which are:
- $limlimits_x rightarrow inftyf(x)=1$
- $limlimits_x rightarrow -inftyf(x)=0$
$f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$
Here is an example of suitable functions from wikipedia.
$endgroup$
Since you are going to minimize later on the log likelihood, there is actually no big difference between $log 2^x=x * log2$ and $log e^x=x$. You see the difference is simply a constant.
Nevertheless one could argue to use $2^x$ instead of $e^x$ und also use $log_2$ instead of $log$ when it comes to the optimizing step. In fact it is possible to use $2^x$ and also many other functions, which show some desired properties.
Which are:
- $limlimits_x rightarrow inftyf(x)=1$
- $limlimits_x rightarrow -inftyf(x)=0$
$f(x) = -f(-x) + 1$, (symmetric in $(0, 0.5)$
Here is an example of suitable functions from wikipedia.
answered Jun 6 at 8:18
Andreas LookAndreas Look
6431 silver badge12 bronze badges
6431 silver badge12 bronze badges
8
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
add a comment |
8
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
8
8
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
I think it's also worth pointing out that one nice reason to use $e$ as the base is that the derivative of $sigma(x)=frac11+e^-x$ is $sigma'(x)=sigma(x)(1-sigma(x))$. Without doing the actual computation, I think if the base was different the formula would only differ by a constant again, but it's a nice property that is specific to $e$.
$endgroup$
– Calvin Godfrey
Jun 6 at 16:55
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
Same goes for $2^x$ when using $log_2$.
$endgroup$
– Andreas Look
Jun 6 at 20:44
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
@AndreasLook I'm not sure what you mean. If you use $2^-x$ then there's an extra factor of $ln(2)$ in the derivative (like Calvin Godfrey said).
$endgroup$
– sfmiller940
Jun 12 at 20:08
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
$begingroup$
No, check out binary logarithm. $log_2 (2^x)=x$.
$endgroup$
– Andreas Look
Jun 13 at 19:32
add a comment |
$begingroup$
So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.
$endgroup$
2
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
add a comment |
$begingroup$
So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.
$endgroup$
2
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
add a comment |
$begingroup$
So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.
$endgroup$
So there are many functions that look sigmoid including the 2 you mentioned, but there are reasons why $e$ is special. The main reason it that the logistic function was originally used to model population growth. And populations, much like interest, can compound over time. Thus, the $e$ becomes a very natural object for this reason. In addition, for theoretical reasons concerning the canonical link function of a glm the logistic is one of the theoretically simplest objects to work with which makes it easy to prove things with.
edited Jun 6 at 15:00
Ethan
1,0088 silver badges29 bronze badges
1,0088 silver badges29 bronze badges
answered Jun 6 at 8:12
Anonymous EmuAnonymous Emu
1504 bronze badges
1504 bronze badges
2
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
add a comment |
2
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
2
2
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
thanks for your answer. what does "canonical link function of a glm" mean?
$endgroup$
– baojieqh
Jun 6 at 9:22
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
@baojieqh For all generalized linear models, one needs to specify a member of the exponential family of distributions. These distributions all share a property where they can be written in such a way so that a function of the scale parameter of the distribution sits "by itself" in an exponent (and the function is only a function of the scale parameter). This function is what people refer to as the canonical link function. For the bernoulli/binomial distribution, where the scale parameter is p, it turns out that this function is ln(p/(1-p)) which is the logit link function.
$endgroup$
– aranglol
Jun 6 at 23:55
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
Hence, the canonical link function for the logistic regression, which assumes a Bernoulli distribution for each row, is the logit link. There are other more theoretical properties as well that make the canonical link function desirable. But it is technically not necessary to use it, you could use the probit for example.
$endgroup$
– aranglol
Jun 6 at 23:58
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
@aranglol thanks for you comments, would you please take a look at this link math.stackexchange.com/q/3253634/656371
$endgroup$
– baojieqh
Jun 7 at 0:37
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
$begingroup$
This seems to be just a hand-waving appeal to the claim that "$e$ is special", without giving any justification about why $e$ is special. Really, the only specialness is the convenience that $tfracddxa^x=a^xln a$, which means that $tfracddxe^x=e^x$.
$endgroup$
– David Richerby
Jun 7 at 9:21
add a comment |
$begingroup$
It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.
You can read more details at my blog.
New contributor
$endgroup$
add a comment |
$begingroup$
It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.
You can read more details at my blog.
New contributor
$endgroup$
add a comment |
$begingroup$
It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.
You can read more details at my blog.
New contributor
$endgroup$
It comes from the basic assumption of the model that there exists a continuous/latent/unobservable $Y^*$ that relates somehow to the observed values of $Y$. The model further assumes that $Y=1$ if the signal of $Y^*$ is above some threshold, and otherwise $Y=0$. The third and last assumption is that the underlying distribution of $Y*$ is the logistic distribution. Once you have these assumptions, it is only a matter of algebra to derive the model.
You can read more details at my blog.
New contributor
edited Jun 16 at 13:03
Stephen Rauch♦
1,5436 gold badges13 silver badges30 bronze badges
1,5436 gold badges13 silver badges30 bronze badges
New contributor
answered Jun 16 at 11:50
Yossi LevyYossi Levy
12 bronze badges
12 bronze badges
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f53308%2fwhy-does-logistic-function-use-e-rather-than-2%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown