-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add minibatch subsampling (doubly stochastic) objective #84
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #84 +/- ##
===========================================
- Coverage 96.09% 82.92% -13.18%
===========================================
Files 11 12 +1
Lines 205 246 +41
===========================================
+ Hits 197 204 +7
- Misses 8 42 +34 ☔ View full report in Codecov by Sentry. |
related: TuringLang/DynamicPPL.jl#633 |
I suggest delaying this to AdvancedVI v0.4 so that the syntax in DynamicPPL is implemented. |
@yebai Sounds good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the theory nor the wider context of AdvancedVI, so my review is quite shallow, but I don't see any significant problems here. I left a few small local proposals.
end | ||
``` | ||
|
||
Notice that, when computing the log-density, we multiple by a constant `likeadj`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that, when computing the log-density, we multiple by a constant `likeadj`. | |
Notice that, when computing the log-density, we multiply by a constant `likeadj`. |
``` | ||
Let's first compare the convergence of full-batch `RepGradELBO` versus subsampled `RepGradELBO` with respect to the number of iterations: | ||
|
||
![](subsampling_iteration.svg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these .svg files missing from the repo? I haven't looked at the built docs, just don't see them in the PR.
# Returns | ||
- `sub`: Subsampled model. | ||
""" | ||
subsample(model::Any, ::Any) = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that subsampling is a more general operation than just a VI thing? If that's the case, could this be moved to DynamicPPL, or even AbstractPPL?
Also, I wonder if an empty function without methods would make more sense. Is returning the unmodified original model a reasonable fallback? I could imagine it confusing users, who would call it and get a return value, not realising it's actually just the original model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yebai Any comments on the current direction on the PPL side?
""" | ||
estimate_gradient!(rng, obj, adtype, out, prob, λ, restructure, obj_state) | ||
estimate_gradient!(rng, obj, adtype, out, prob, λ, restructure, obj_state, objargs...; kwargs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the varargs be explained in the docstring?
Subsampled(objective, batchsize, data) | ||
|
||
Subsample `objective` over the dataset represented by `data` with minibatches of size `batchsize`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment on what happens if batchsize
does not divide length(data)
, or whether that's significant at all?
This is a draft for the subsampling variational objective, which addresses #38 . Any perspectives/concerns/comments are welcome! The current plan is to only implement random reshuffling. As I recently showed that there is no point in implementing independent subsampling. Although importance sampling of datapoints could be an interesting addition, it will require custom
DynamicPPL.Context
s.The key design decisions is the following function:
Given a previous slack DM thread with @yebai , this interface could be straightforwardly implemented by Turing models as
where
data_or_indices
could be made a reserved keyword argument for Turing models. Then, I thinkshould generally work?
My current guess would be that
subsample(m::DynamicPPL.Model, batch)
would have to end up in the main Turing repository.