aleatoric uncertainty loss term #1

jin8 · 2019-05-21T05:04:20Z

Hi, is there a reason why you did not put activation function for the mu, logvar at the end of the decoder.
I am getting negative values for the loss. Is it okay to have a negative loss for the model??

pmorerio · 2019-05-21T14:01:37Z

Hi.

Concerning the mean branch of the decoder, I am not putting any activation because I am generating images in the range [-1,1] (check the load_mnist function), thus I want the decoder to generate at least in that range. In principle you could also use tanh activation for that, but that was giving numerical issues.

Concerning the log_var branch, you should allow for any real number, (also negative values, when the variance is smaller than 1). This is why, again, no activation should be needed in my opinion.

Eq.(8) in the paper arXiv version seems to suggest that a negative loss is allowed.

However the loss has a lower bound (see pic below), provided ||y-\hat{y}|| is not zero. The minimum can be negative.

However, this is my personal understanding. Please feel free to find any fault in it.

Best,

P.

sdsy888 · 2019-06-28T10:06:31Z

Hi @pmorerio , does the variable log_var in your code mean logits_variable or log(function) variance?

As indicated in the paper, the variance $\delta$ should be processed by log function to become $s_i$:

To my understanding, the logits from the conv2d function aren't processed by log function.

Is there any reason why you didn't tf.log the variance explicitly?

Thank you!

pmorerio · 2019-06-28T11:22:50Z

Hi @sdsy888, thanks for your question.
The variable log_var is precisely $s_i$, and I named it after the log function. The network is regressing $s_i$ directly, instead of predicting $\sigma_i^2$, in order to overcome possible numerical issues given by the log function when applied to the variance (from the paper: 'In practice, we train the network to predict the log variance [...] This is because it is more numerically stable than regressing the variance'). In fact, the loss is then written as in eq (8), with log_var in place of $s_i$.

The variable log_var is later on exponentiated in order to save the actual variance in the tf.summary.

Hope this clarifies.

sdsy888 · 2019-06-28T15:56:09Z

Hi @sdsy888, thanks for your question.
The variable log_var is precisely $s_i$, and I named it after the log function. The network is regressing $s_i$ directly, instead of predicting $\sigma_i^2$, in order to overcome possible numerical issues given by the log function when applied to the variance (from the paper: 'In practice, we train the network to predict the log variance [...] This is because it is more numerically stable than regressing the variance'). In fact, the loss is then written as in eq (8), with log_var in place of $s_i$.

The variable log_var is later on exponentiated in order to save the actual variance in the tf.summary.

Hope this clarifies.

Oh, I thought it in the wrong direction. I thought eq.8 will first predict variance and then get the real value using exp Thank you!

ShellingFord221 · 2019-12-06T08:36:40Z

It seems that in your code of aleatoric uncertainty, mean and log_var are two separate parameters trained in the model. I thought that they should come from the output of the model (i.e. mean means the average of output y, log_var means the log variance of the output y). Why model them as two variables rather than results from y? Thanks!

pmorerio · 2019-12-06T10:04:38Z

Hi, from my understanding what you describe is epistemic uncertainty, practically implemented by sampling different networks via dropout at test time and calculating average and variance.

To model aleatoric uncertainty I follow equation 6 in the paper:

Both mean and variance are output of the net f ('head is split to predict both mean and covariance' reads the text below).

Well at least this is my understanding.

ShellingFord221 · 2019-12-06T12:26:08Z

Thank you for your answer! I'm a little bit confused that how to 'use a single network to transform the input with its head split to predict both mean as well as variance'? Just replace them with the single output y in the model? Are there any supporting theories? Thanks!

pmorerio · 2019-12-06T20:00:19Z

Sorry I did not really get your questions.

ShellingFord221 · 2019-12-07T14:17:11Z

Sorry for the disambiguated description. I mean that in your code, it seems that you first pass the image through each layer of the network, then decode the output by passing it through the same layers of the network to get mean and variance. Why is that?

pmorerio · 2019-12-09T07:53:10Z

Hi,
the network has encoder-decoder structure, since I want to regress images.
Images goes through the encoder, and one decoding layer. After that the decoder 'splits' the output to have a prediction for the 'mean' and a prediction for the 'log_var'. Both predictions contribute to the loss.
I guess you can 'split' before or after, that would not make much difference. I have no intuition nor supporting theory for my design choice, but I guess you need at least a couple of layers for each branch.

I hope this answers your question.
(For any other question, please open a new issue, so that different topic are not mixed in a single thread)

pmorerio added the question Further information is requested label May 27, 2019

pmorerio closed this as completed May 27, 2019

pmorerio pinned this issue May 27, 2019

pmorerio mentioned this issue Jan 7, 2020

aleatoric uncertainty loss #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aleatoric uncertainty loss term #1

aleatoric uncertainty loss term #1

jin8 commented May 21, 2019

pmorerio commented May 21, 2019 •

edited

Loading

sdsy888 commented Jun 28, 2019

pmorerio commented Jun 28, 2019 •

edited

Loading

sdsy888 commented Jun 28, 2019

ShellingFord221 commented Dec 6, 2019

pmorerio commented Dec 6, 2019

ShellingFord221 commented Dec 6, 2019

pmorerio commented Dec 6, 2019

ShellingFord221 commented Dec 7, 2019

pmorerio commented Dec 9, 2019

aleatoric uncertainty loss term #1

aleatoric uncertainty loss term #1

Comments

jin8 commented May 21, 2019

pmorerio commented May 21, 2019 • edited Loading

sdsy888 commented Jun 28, 2019

pmorerio commented Jun 28, 2019 • edited Loading

sdsy888 commented Jun 28, 2019

ShellingFord221 commented Dec 6, 2019

pmorerio commented Dec 6, 2019

ShellingFord221 commented Dec 6, 2019

pmorerio commented Dec 6, 2019

ShellingFord221 commented Dec 7, 2019

pmorerio commented Dec 9, 2019

pmorerio commented May 21, 2019 •

edited

Loading

pmorerio commented Jun 28, 2019 •

edited

Loading