You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the difference between Masksembles and Batchensemble? Is there any particular reason you do not discuss it in the paper?
As far as I understand the general idea of Masksembles and Batchensemble is pretty similar: they don't have this scale property to move between dropout and naive ensemble, however, their mask is learnable, which brings the question, if those two ideas could be combined?
I've been developing a somewhat similar approach. I was feeding the same data to all the modes (in contrast to your approach) and was forcing diversity between modes' predictions by maximizing the L1 difference of predictions. Have you tried enforcing diversity anyhow?
Best,
Daniel
The text was updated successfully, but these errors were encountered:
Basically we've used their implementation idea (since it's very convenient) but our major contribution was to develop a method that would allow changing correlation between submodels (+ providing Ensembles / MC-Dropout transition). In BatchEnsemble there is no such thing though.
I think the idea of combining the approaches is an interesting one. I've tried to make our masks learnable too (so every value in masks is in [0, 1]) but this way we're losing control over correlation of generated submodels (for example, our model could decide to make all of the masks the same or very similar). I have an idea how to combine these two though, would be happy to share :)
In general, our correlation parameter does exactly this: by reducing the correlation parameter you increase the diversity of predictions. I doubt that we've tried to enforce diversity by incorporating some properties into loss though.
@nikitadurasov Hi, could you please tell me how masksembles change the correlation between submodels? From the paper and code, I don't understand this. Does the drop features that are not used in any mask help this?
Hey @ToBeNormal, one of the general properties of Masksembles approach is the "correlation" of its submodels. Each submodel is represented by according binary mask in Masksembles layers. The less ones do binary masks share -- the less correlated are their predictions. You can check last section of supplementary material to find more information on that.
Hi @nikitadurasov ,
I have a couple of questions about Masksembles:
Best,
Daniel
The text was updated successfully, but these errors were encountered: