Understanding the Joint Autoregressive Model #165
-
Hi, I'm having some difficulties to wrap my head around why we can parallelize the entropy estimation during a forward pass for training but need to sequentially compute the scale and mean at each spatial location (h, w) when compressing the latent variable. I think I understand the general idea of Autoregressive models, where we want to mask unseen variables s.t. we can model the conditional probabilities. However, isn't the same masked convolution in the context prediction (and the 1x1 convolutions in the entropy parameters network) already ensuring that each index of the computed gaussian parameters is context dependent? That is, by my understanding, the way the weights are applied in the convolutions should ensure that the output scales and weights are computed autoregressively. So why do we need to apply the masked convolution and entropy parameter network patch wise during compression and not parallelize the process the same as in the pure latent variable models, i.e. FP, SHP, MSHP? BR |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
For For |
Beta Was this translation helpful? Give feedback.
For
decompress()
, sequential decoding of each pixel is necessary due to causality.For
compress()
, as you note, it is not necessary to run a sequential pixel-by-pixel computation since one may simply use the masked convolution to generate the exacty_hat
that is available to the decoder in a single GPU call, à laforward()
. This can probably be optimized in the CompressAI implementation. The only issue I see is if they_hat
that is generated is slightly different due to e.g. floating point errors or other optimizations. The safest method is simply to repeat the exact sequence of computations for encoding as for decoding, which always works on the same hardware, if we assume that the hardw…