What loss function to use when labels are probabilities? Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameExtend the loss function from the single action to the n-action case per time stepHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorchGradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?How to obtain a formula for loss, when given an iterative update rule in gradient descent?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?

Project Euler #1 in C++

Does the Mueller report show a conspiracy between Russia and the Trump Campaign?

Mounting TV on a weird wall that has some material between the drywall and stud

A proverb that is used to imply that you have unexpectedly faced a big problem

What initially awakened the Balrog?

Asymptotics question

I can't produce songs

Is openssl rand command cryptographically secure?

What is the origin of 落第?

Resize vertical bars (absolute-value symbols)

GDP with Intermediate Production

What does Turing mean by this statement?

Can two people see the same photon?

Why is the change of basis formula counter-intuitive? [See details]

Why is it faster to reheat something than it is to cook it?

How many time has Arya actually used Needle?

A term for a woman complaining about things/begging in a cute/childish way

Is there hard evidence that the grant peer review system performs significantly better than random?

After Sam didn't return home in the end, were he and Al still friends?

Universal covering space of the real projective line?

Central Vacuuming: Is it worth it, and how does it compare to normal vacuuming?

Why does electrolysis of aqueous concentrated sodium bromide produce bromine at the anode?

A `coordinate` command ignored

The test team as an enemy of development? And how can this be avoided?

What loss function to use when labels are probabilities?

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

Announcing the arrival of Valued Associate #679: Cesar Manara

Unicorn Meta Zoo #1: Why another podcast?Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameExtend the loss function from the single action to the n-action case per time stepHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorchGradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?How to obtain a formula for loss, when given an iterative update rule in gradient descent?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

edited Apr 15 at 10:11

nbro

2,4441726

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

add a comment |

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

edited Apr 15 at 10:11

nbro

2,4441726

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

add a comment |

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

edited Apr 15 at 10:11

nbro

2,4441726

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

neural-networks machine-learning loss-functions probability-distribution

edited Apr 15 at 10:11

nbro

2,4441726

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

edited Apr 15 at 10:11

nbro

2,4441726

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

edited Apr 15 at 10:11

nbro

2,4441726

edited Apr 15 at 10:11

nbro

2,4441726

edited Apr 15 at 10:11

nbro

2,4441726

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

asked Apr 14 at 22:13

Thomas Johnson

1233

asked Apr 14 at 22:13

Thomas Johnson

1233

New contributor

Thomas Johnson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

add a comment |

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

add a comment |

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

answered Apr 14 at 22:38

Philip Raeisghasem

1,149121

add a comment |

Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Thomas Johnson is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Artificial Intelligence Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

IUi25AkMRohRGbF UwgI5N2tWk,ytAzW0hIDoY

搜尋此網誌

Otdfbt

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

1 Answer
1

1 Answer
1

1 Answer
1

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070