Masksembles - a couple of questions #6

dkoguciuk · 2021-06-18T10:40:34Z

Hi @nikitadurasov ,

I have a couple of questions about Masksembles:

What is the difference between Masksembles and Batchensemble? Is there any particular reason you do not discuss it in the paper?
As far as I understand the general idea of Masksembles and Batchensemble is pretty similar: they don't have this scale property to move between dropout and naive ensemble, however, their mask is learnable, which brings the question, if those two ideas could be combined?
I've been developing a somewhat similar approach. I was feeding the same data to all the modes (in contrast to your approach) and was forcing diversity between modes' predictions by maximizing the L1 difference of predictions. Have you tried enforcing diversity anyhow?

Best,
Daniel

nikitadurasov · 2021-06-23T12:11:40Z

Hey @dkoguciuk,

So,

Basically we've used their implementation idea (since it's very convenient) but our major contribution was to develop a method that would allow changing correlation between submodels (+ providing Ensembles / MC-Dropout transition). In BatchEnsemble there is no such thing though.
I think the idea of combining the approaches is an interesting one. I've tried to make our masks learnable too (so every value in masks is in [0, 1]) but this way we're losing control over correlation of generated submodels (for example, our model could decide to make all of the masks the same or very similar). I have an idea how to combine these two though, would be happy to share :)
In general, our correlation parameter does exactly this: by reducing the correlation parameter you increase the diversity of predictions. I doubt that we've tried to enforce diversity by incorporating some properties into loss though.

Best,
Nikita

ZhouCX117 · 2021-08-16T11:48:45Z

@nikitadurasov Hi, could you please tell me how masksembles change the correlation between submodels? From the paper and code, I don't understand this. Does the drop features that are not used in any mask help this?

nikitadurasov · 2021-09-04T13:41:07Z

Hey @ToBeNormal, one of the general properties of Masksembles approach is the "correlation" of its submodels. Each submodel is represented by according binary mask in Masksembles layers. The less ones do binary masks share -- the less correlated are their predictions. You can check last section of supplementary material to find more information on that.

nikitadurasov self-assigned this Jun 23, 2021

nikitadurasov added the question Further information is requested label Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Masksembles - a couple of questions #6

Masksembles - a couple of questions #6

dkoguciuk commented Jun 18, 2021

nikitadurasov commented Jun 23, 2021

ZhouCX117 commented Aug 16, 2021

nikitadurasov commented Sep 4, 2021

Masksembles - a couple of questions #6

Masksembles - a couple of questions #6

Comments

dkoguciuk commented Jun 18, 2021

nikitadurasov commented Jun 23, 2021

ZhouCX117 commented Aug 16, 2021

nikitadurasov commented Sep 4, 2021