Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masksembles - a couple of questions #6

Open
dkoguciuk opened this issue Jun 18, 2021 · 3 comments
Open

Masksembles - a couple of questions #6

dkoguciuk opened this issue Jun 18, 2021 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@dkoguciuk
Copy link
Contributor

Hi @nikitadurasov ,

I have a couple of questions about Masksembles:

  1. What is the difference between Masksembles and Batchensemble? Is there any particular reason you do not discuss it in the paper?
  2. As far as I understand the general idea of Masksembles and Batchensemble is pretty similar: they don't have this scale property to move between dropout and naive ensemble, however, their mask is learnable, which brings the question, if those two ideas could be combined?
  3. I've been developing a somewhat similar approach. I was feeding the same data to all the modes (in contrast to your approach) and was forcing diversity between modes' predictions by maximizing the L1 difference of predictions. Have you tried enforcing diversity anyhow?

Best,
Daniel

@nikitadurasov
Copy link
Owner

Hey @dkoguciuk,

So,

  1. Basically we've used their implementation idea (since it's very convenient) but our major contribution was to develop a method that would allow changing correlation between submodels (+ providing Ensembles / MC-Dropout transition). In BatchEnsemble there is no such thing though.

  2. I think the idea of combining the approaches is an interesting one. I've tried to make our masks learnable too (so every value in masks is in [0, 1]) but this way we're losing control over correlation of generated submodels (for example, our model could decide to make all of the masks the same or very similar). I have an idea how to combine these two though, would be happy to share :)

  3. In general, our correlation parameter does exactly this: by reducing the correlation parameter you increase the diversity of predictions. I doubt that we've tried to enforce diversity by incorporating some properties into loss though.

Best,
Nikita

@nikitadurasov nikitadurasov self-assigned this Jun 23, 2021
@nikitadurasov nikitadurasov added the question Further information is requested label Jun 23, 2021
@ZhouCX117
Copy link

@nikitadurasov Hi, could you please tell me how masksembles change the correlation between submodels? From the paper and code, I don't understand this. Does the drop features that are not used in any mask help this?

@nikitadurasov
Copy link
Owner

Hey @ToBeNormal, one of the general properties of Masksembles approach is the "correlation" of its submodels. Each submodel is represented by according binary mask in Masksembles layers. The less ones do binary masks share -- the less correlated are their predictions. You can check last section of supplementary material to find more information on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants