-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rejection sampling variational inference #819
base: master
Are you sure you want to change the base?
Rejection sampling variational inference #819
Conversation
Thanks for working on this! Ping me whenever you'd like some feedback. |
Will do! It's very WIP for now. I want to get a thing working, then clean it up considerably. |
…what is going on. tensorflow: you exhaust me
Hi. A few updates/questions:
After this, everything should be on the right track. Thus far, I've worked on
as well? I haven't taken the time to think this through. How does this all sound? Thanks! |
edward/optimizers/sgd.py
Outdated
self.s_n = s_n | ||
self.n = n | ||
|
||
def apply_gradients(self, grads_and_vars, global_step=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd quite appreciate if you could glance at this method as well, as my integration test passes on some days and fails on others — with 0 changes to my code. Promise 🤞.
Hey @dustinvtran, @naesseth. I've:
In the latter two, I pinned results to those computed in this notebook and vetted them thoroughly. My gradients are still exploding. Might you have a chance to give some 👀 this week? |
Hey @dustinvtran. Following up :) |
Apologies for the delay! Busy for ICML stuff due this Friday. Maybe ping me this weekend? :) |
Ah, no worries! Will do. The PR is in a good state, I think. Just need to close the gap between "unit-testing 1 iteration of RSVI gradient calculations passes" and "it just works". I'm guessing there's some slight "magic" with learning rates, supports, or the like that I'm missing. This is my hunch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work! I think the testing as is is fine.
I like your API for rejection sampling for Gamma. In future work after merging this, it would be nice to move upstream to TensorFlow distributions. There, we can think a bit harder about how to incorporate various forms of reparameterized samplers in the tf.contrib.distributions.Gamma
.
Edward 2.0's ed.klqp
will incorporate the loss function you wrote down here.
@@ -144,6 +143,7 @@ def run(self, variables=None, use_coordinator=True, *args, **kwargs): | |||
|
|||
for _ in range(self.n_iter): | |||
info_dict = self.update() | |||
print(info_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm?
@@ -123,7 +123,6 @@ def run(self, variables=None, use_coordinator=True, *args, **kwargs): | |||
Passed into `initialize`. | |||
""" | |||
self.initialize(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add back newline? unrelated to PR
@@ -32,7 +32,7 @@ class KLpq(VariationalInference): | |||
|
|||
with respect to $\\theta$. | |||
|
|||
In conditional inference, we infer $z` in $p(z, \\beta | |||
In conditional inference, we infer $z$ in $p(z, \\beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to this PR. Can you make a new PR to fix this?
tf.summary.scalar("loss/reg_penalty", reg_penalty, | ||
collections=[inference._summary_key]) | ||
|
||
g_rep = tf.gradients(rep, var_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you need the multiple gradient calls and not just one? This seems inefficient.
Hey @dustinvtran. Thanks for the feedback. I left this in "debug" state (print statements, etc.), as it doesn't yet work. Specifically, the main integration test, Any insight as to why this might be? I'm guessing I'm lacking some expertise in getting this to work. NB: the "single pass gradient computation" integration test, Also, I think there are other code-organizational considerations to be made before readying this for merge. For instance, what happens when we have > 1 latent variables, where some require a rejection sampler and others don't? In short: I think I need some help :) |
Ping, @dustinvtran :) What's the best way forward? Have a moment to review in the coming days? Anyone else I should reach out to? Cheers :) |
Not sure if I know enough about the algorithm to help unfortunately. What happens if you try 1000 samples per iteration? Maybe @naesseth can reply? |
Cool! Will ping them here. @naesseth @slinderman have a moment to help get this merged? If you're still in NYC, happy to come meet in person as well, so as to get this one done! |
Hi @williamabrwolf, I’m traveling now but I’ll be back in NYC mid March. Happy to meet then if you’d like. I’ll have limited cycles between now and then but I’ll try to take a look at the code here. Perhaps Christian can also help out in the meantime. Glad to see you’re integrating this into Edward! |
@slinderman mid-March works great. Grateful for the help, and happy to meet then if we haven't resolved. [email protected] is me. Cheers :). |
I'll try and take a look early next week, currently traveling. |
@slinderman @naesseth hey all. back in NYC? would love to meet for 2 hours and get this to a place where we can soon merge. |
Hey @slinderman, @naesseth. Following up :). Back in the city? Happy to travel to you to make this happen. Cheers! |
Hey Will,
I'm around this week. Send me an email directly and we'll set up a time to
meet.
Best,
Scott
…On Wed, Mar 21, 2018 at 2:18 PM, Will Wolf ***@***.***> wrote:
Hey @slinderman <https://github.com/slinderman>, @naesseth
<https://github.com/naesseth>. Following up :). Back in the city? Happy
to travel to you to make this happen.
Cheers!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#819 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFXwKME1Zfo3UM94FQLAExQWnZo03PSLks5tgplggaJpZM4RVfTl>
.
|
g_cor = tf.gradients(cor, var_list) | ||
g_entropy = tf.gradients(q_entropy, var_list) | ||
|
||
grad_summands = zip(*[g_rep, g_cor, g_entropy]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try dropping g_cor
from this summand and see if tests still pass?
Expected behavior: pass at a higher tolerance, but not blow up.
This is a possible culprit re: why gradients are exploding in running _test_poisson_gamma
.
With a reasonably small step size, maybe 100 epochs.
Worth keeping an eye on g_entropy
:
- First, try
g_rep
andg_entropy
- Next, try just
g_rep
Print all the gradient terms from notebook as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"With a reasonably small step size, maybe 100 epochs." --> i.e. it should pass "with a reasonably small step size, and run for maybe 100 epochs."
WIP. Addresses #379.