Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renyi divergence #769

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open

Renyi divergence #769

wants to merge 40 commits into from

Conversation

jbregli
Copy link

@jbregli jbregli commented Sep 27, 2017

Here is an implementation of the Renyi divergence variational inference.
There's also an example on VAEs.

Here is a link to the edward forum with some more info:
https://discourse.edwardlib.org/t/renyi-divergence-variational-inference/366/3

ps: Sorry for the quite messy commit history.

Copy link
Member

@dustinvtran dustinvtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to go catch a flight but some preliminary comments:

.gitignore Outdated
@@ -100,3 +100,9 @@ docs/*.html
# IDE related
.idea/
.vscode/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove changes that aren't relevant for this PR? This includes changes to .gitignore here as well as deletion of CSVs.

from edward.util import copy

try:
from edward.models import Normal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we use 2-space indent.

from __future__ import print_function

import six
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we alphabetize the ordering of the import libraries.

"{0}. Your TensorFlow version is not supported.".format(e))


class Renyi_divergence(VariationalInference):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we use CamelCase for class names.

To perform the optimization, this class uses the techniques from
Renyi Divergence Variational Inference (Y. Li & al, 2016)

# Notes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are parsed as Markdown and formatted in a somewhat specific way as they appear on the API docs. I recommend following the other classes, where you would denote a subsection as #### Notes and when writing bullet points, do, e.g.,

#### Notes

+ bullet 1
+ bullet 2
  + maybe bulleted list in a bullet

Copy link
Member

@dustinvtran dustinvtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Some comments below. The code looks correct and only minor suggestions with respect to formatting are laid out.

Can you include a unit test? See, e.g., how KLpq is tested under the file tests/inferences/test_klpq.py.

$ \text{D}_{R}^{(\alpha)}(q(z)||p(z \mid x))
= \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz $

To perform the optimization, this class uses the techniques from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Periods at end of sentences. (If you'd look at the generated API for the class, I recommend compiling the website following instructions from docs/.)

= \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz $

To perform the optimization, this class uses the techniques from
Renyi Divergence Variational Inference (Y. Li & al, 2016)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use bibtex for handling references in docstrings. This is handled by adding the appropriate bib entry to docs/tex/bib.bib; make sure it's also written in the right order: we sort bib entries by their year, then alphabetically according to their citekey within each year.

When using references, you can produce (Li et al., 2016) and Li et al. (2016) by writing [@li2016renyi] and @li2016renyirespectively, assuming thatli2016renyi` is the citekey.


# Notes:
- Renyi divergence does not have any analytic version.
- Renyi divergence does not have any version for non reparametrizable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does but the gradient estimator in @li2016variational doesn't. I recommend just stating that this inference algorithm is restricted to variational approximations whose random variables all satisfy rv.reparameterization_type == tf.contrib.distributions.FULLY_REPARAMETERIZED.

Also, instead of checking this during build_loss_and_gradients I recommend checking this during the __init__. This sort of check is done statically any graph construction similar to how we check for compatible shapes in all latent variables and data during __init__.


def initialize(self,
n_samples=32,
alpha=1.,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As convention, we append all numerics with 0, e.g., 1.0.

Number of samples from variational model for calculating
stochastic gradients.
alpha: float, optional.
Renyi divergence coefficient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to specify the domain of the coefficient. E.g., Must be greater than 0. or etc.

"Variational Renyi inference only works with reparameterizable"
" models")

#########
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used in one location and is a one-liner; could you write that line instead of defining a new function?

scale=Dense(d, activation='softplus')(hidden))

# Bind p(x, z) and q(z | x) to the same TensorFlow placeholder for x.
inference = Renyi_divergence({z: qz}, data={x: x_ph})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks exactly the same as an older version of vae.py but only differs in this line. To keep the VAE versions better synced, could you add a comment suggesting that this is also an alternative in the existing vae.py?

Ideally, we'd like a specific application where ed.RenyiDivergence produces better results by some metric than alternatives. IIRC, the paper had some interesting results for a Bayesian neural net on some specific UCI data sets. That would be great to have and reproduce some of their results.

If you don't have time for this, we can leave it off for now and raise it as a Github issue post-merging this PR.

self.scale.get(x, 1.0)
* x_copy.log_prob(dict_swap[x]))

logF = [p - q for p, q in zip(p_log_prob, q_log_prob)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of logF, what about something like log_ratios, which is more Pythonic in snake_case and also more semantically meaningful?

@jbregli
Copy link
Author

jbregli commented Sep 27, 2017

Thanks for the suggestion and the very informative feedback.

Can you include a unit test? See, e.g., how KLpq is tested under the file tests/inferences/test_klpq.py.

Will do later today.

@jbregli
Copy link
Author

jbregli commented Sep 28, 2017

I've added some testing in a similar way as KLqp. (both normal_normal and the bernouilli distribution.
For each cases, I've tested most of the possible cases of the Renyi VI:
KL, VR-max, VR-min, alpha<0, alpha>0.

import tensorflow as tf

from edward.models import Bernoulli, Normal
from edward.inferences.renyi_divergence import RenyiDivergence
Copy link
Member

@dustinvtran dustinvtran Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the import works by instead using ed.RenyiDivergence in the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some issue with this but it should be working now.

[@li2016renyi].

#### Notes
+ The gradient estimator used here does not have any analytic version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Markdown formatting, you don't need the 4 spaces of indentation. E.g., you can just do

#### Notes

+ The gradient estimator ...
+ ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

= \frac{1}{\alpha-1} \log \int q(z)^{\alpha} p(z \mid x)^{1-\alpha} dz.$

The optimization is performed using the gradient estimator as defined in
[@li2016renyi].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The citekey is being used as a direct object so it should be [@li2016renyi] -> @li2016renyi.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

+ See Renyi Divergence Variational Inference [@li2016renyi] for
more details.
"""
if self.is_reparameterizable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_reparameterizable should be checked with the possible raising error during the __init__, and since it's checked there it doesn't need to be stored in the class. This also helps to remove one layer of indentation in this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if self.backward_pass == 'max':
log_ratios = tf.stack(log_ratios)
log_ratios = tf.reduce_max(log_ratios, 0)
loss = tf.reduce_mean(log_ratios)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the code correctly, log_ratios when first created is a list of n_samples elements, where each element is a log ratio calculation per sample from q. For the min / max modes, we take the min / max of these log ratios, which is a scalar.

Is tf.reduce_mean for the loss needed? You can also remove the tf.stack line in the min and max cases in the same way you didn't use it for the self.alpha \approx 1 case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for spotting this.

[@li2016renyi]

#### Notes
This example is almost exactly similar to example/vae.py.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the miscommunication. What I meant was that you can edit vae.py, comment out the 1-2 lines of code to use ed.RenyiDivergence, and add these notes there. This helps to compress the content in the examples, c.f., https://github.com/blei-lab/edward/blob/master/examples/bayesian_logistic_regression.py#L51.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed vae_renyi.py and modifed vae.py instead.
The version of vae.py I had wasn't running though. So I've modified it quite a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using the latest version of Edward? We updated a few details in vae.py so it actually runs better. For example, you should be using the observations library and a generator, which is far more transparent than the mnist_data class from that TensorFlow tutorial.

In addition, since vae.py is also our canonical VAE example, I prefer keeping it as ed.KLqp as the default, and with the renyi divergence option commented out; similarly, the top-level comments should be written in-line near the renyi divergence option instead.

If you have thoughts otherwise, happy to take alternative suggestions.


class test_renyi_divergence_class(tf.test.TestCase):

def _test_normal_normal(self, Inference, *args, **kwargs):
Copy link
Member

@dustinvtran dustinvtran Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since RenyiDivergence is used across all tests, you don't need Inference as an arg to the test functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used the same template as test_klpq where only KLpq is used during the tests and where Inference is stil an argument to the test functions.
But I think I have modified to be closer to what you had in mind

@jbregli
Copy link
Author

jbregli commented Sep 29, 2017

It keeps failing the travis-ci check for python 2.7 but before getting into the proper testing of my code (fail to install matplotlib and seaborn).
Anything I've done wrong on my side?

@dustinvtran
Copy link
Member

Looks like this is happening in Travis on any build. I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants