Understanding probability distribution for entropy coding #139
-
Hi,
I thanks in advance for your precious answers. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi, So first of all, we are doing two things when training: we train a model to optimally compress images in a differential manner, but we also want to generate fixed-point probability mass functions (or their corresponding CDF) that model well the distributions of the generated latent tensors from training samples. For the final (real-life) codec, they will be shared by the encoder and the decoder for bit accurate decoding of the bitstream at inference. The tail_mass corresponds to the cumulative probabilities that we'll exclude form the "range" of our main arithmetic coder, split in a lower tail and upper tail of the distribution. You can check the range coder interface, which shows that the values outside the range are bypass coded. This costs more bits, so you want to find a good tradeoff between a relevant range depending on the precision of your coder, and the probability of having to encode these less frequent values in a costlier manner. You can select the tail mass, which corresponds to the ratio of samples bypass (exp-golomb) coding. Then, the system derives the corresponding "quantiles", i.e. the 3 values (of latent/hyper latent samples): the lower limit of the range (defining the left tail), the median value and the upper limit (right tail) of each distribution. These values are used to compute the discrete CDF that will be used for actual compression after training. The actual range coding at inference, using discrete CDFs and encoding of less probable 'tails' with exp-golomb (bypass) coding, is not considered in the main training loss, which relies on a differentiable approximation of the entropy by a small fully connected network (see annex 6.1/6.2 of the original paper). Hope this helps, but again, you can refer to the original paper(s) mentioned here and in the code for more explanations, as well as tensorflow compression. |
Beta Was this translation helpful? Give feedback.
Hi,
There are no stupid questions.
So first of all, we are doing two things when training: we train a model to optimally compress images in a differential manner, but we also want to generate fixed-point probability mass functions (or their corresponding CDF) that model well the distributions of the generated latent tensors from training samples. For the final (real-life) codec, they will be shared by the encoder and the decoder for bit accurate decoding of the bitstream at inference.
The tail_mass corresponds to the cumulative probabilities that we'll exclude form the "range" of our main arithmetic coder, split in a lower tail and upper tail of the distribution. You can check the range c…