Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aleatoric uncertainty loss term #1

Closed
jin8 opened this issue May 21, 2019 · 10 comments
Closed

aleatoric uncertainty loss term #1

jin8 opened this issue May 21, 2019 · 10 comments
Labels
question Further information is requested

Comments

@jin8
Copy link

jin8 commented May 21, 2019

Hi, is there a reason why you did not put activation function for the mu, logvar at the end of the decoder.
I am getting negative values for the loss. Is it okay to have a negative loss for the model??

@pmorerio
Copy link
Owner

pmorerio commented May 21, 2019

Hi.

Concerning the mean branch of the decoder, I am not putting any activation because I am generating images in the range [-1,1] (check the load_mnist function), thus I want the decoder to generate at least in that range. In principle you could also use tanh activation for that, but that was giving numerical issues.

Concerning the log_var branch, you should allow for any real number, (also negative values, when the variance is smaller than 1). This is why, again, no activation should be needed in my opinion.

Eq.(8) in the paper arXiv version seems to suggest that a negative loss is allowed.
image
However the loss has a lower bound (see pic below), provided ||y-\hat{y}|| is not zero. The minimum can be negative.

image

However, this is my personal understanding. Please feel free to find any fault in it.

Best,

P.

@pmorerio pmorerio added the question Further information is requested label May 27, 2019
@pmorerio pmorerio pinned this issue May 27, 2019
@sdsy888
Copy link

sdsy888 commented Jun 28, 2019

Hi @pmorerio , does the variable log_var in your code mean logits_variable or log(function) variance?

As indicated in the paper, the variance $\delta$ should be processed by log function to become $s_i$:
WX20190628-025117@2x

To my understanding, the logits from the conv2d function aren't processed by log function.

Is there any reason why you didn't tf.log the variance explicitly?

Thank you!

@pmorerio
Copy link
Owner

pmorerio commented Jun 28, 2019

Hi @sdsy888, thanks for your question.
The variable log_var is precisely $s_i$, and I named it after the log function. The network is regressing $s_i$ directly, instead of predicting $\sigma_i^2$, in order to overcome possible numerical issues given by the log function when applied to the variance (from the paper: 'In practice, we train the network to predict the log variance [...] This is because it is more numerically stable than regressing the variance'). In fact, the loss is then written as in eq (8), with log_var in place of $s_i$.

The variable log_var is later on exponentiated in order to save the actual variance in the tf.summary.

Hope this clarifies.

@sdsy888
Copy link

sdsy888 commented Jun 28, 2019

Hi @sdsy888, thanks for your question.
The variable log_var is precisely $s_i$, and I named it after the log function. The network is regressing $s_i$ directly, instead of predicting $\sigma_i^2$, in order to overcome possible numerical issues given by the log function when applied to the variance (from the paper: 'In practice, we train the network to predict the log variance [...] This is because it is more numerically stable than regressing the variance'). In fact, the loss is then written as in eq (8), with log_var in place of $s_i$.

The variable log_var is later on exponentiated in order to save the actual variance in the tf.summary.

Hope this clarifies.

Oh, I thought it in the wrong direction. I thought eq.8 will first predict variance and then get the real value using exp Thank you!

@ShellingFord221
Copy link

It seems that in your code of aleatoric uncertainty, mean and log_var are two separate parameters trained in the model. I thought that they should come from the output of the model (i.e. mean means the average of output y, log_var means the log variance of the output y). Why model them as two variables rather than results from y? Thanks!

@pmorerio
Copy link
Owner

pmorerio commented Dec 6, 2019

Hi, from my understanding what you describe is epistemic uncertainty, practically implemented by sampling different networks via dropout at test time and calculating average and variance.

To model aleatoric uncertainty I follow equation 6 in the paper:
image
Both mean and variance are output of the net f ('head is split to predict both mean and covariance' reads the text below).

Well at least this is my understanding.

@ShellingFord221
Copy link

Thank you for your answer! I'm a little bit confused that how to 'use a single network to transform the input with its head split to predict both mean as well as variance'? Just replace them with the single output y in the model? Are there any supporting theories? Thanks!

@pmorerio
Copy link
Owner

pmorerio commented Dec 6, 2019

Sorry I did not really get your questions.

@ShellingFord221
Copy link

Sorry for the disambiguated description. I mean that in your code, it seems that you first pass the image through each layer of the network, then decode the output by passing it through the same layers of the network to get mean and variance. Why is that?

@pmorerio
Copy link
Owner

pmorerio commented Dec 9, 2019

Hi,
the network has encoder-decoder structure, since I want to regress images.
Images goes through the encoder, and one decoding layer. After that the decoder 'splits' the output to have a prediction for the 'mean' and a prediction for the 'log_var'. Both predictions contribute to the loss.
I guess you can 'split' before or after, that would not make much difference. I have no intuition nor supporting theory for my design choice, but I guess you need at least a couple of layers for each branch.

I hope this answers your question.
(For any other question, please open a new issue, so that different topic are not mixed in a single thread)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants