Understanding probability distribution for entropy coding #139

AlbertoPresta · 2022-05-19T14:49:36Z

AlbertoPresta
May 19, 2022

Hi,
sorry if I bother you with stupid questions, but I did not understand some aspects of your code and where to find a detailed description:

What is the mathematical meaning of the parameters called self.quantiles? you optimize them in the auxiliary loss and then you use them in the function update to find the domain of the pmf for each dimension of the latent space.
What are you trying to optimize with the auxiliary loss? you try to minimize the sum of the absolute difference between self.quantiles and self.target...but I do not understand what it means.
what is the meaning of the tail_mass number? is the probability mass outside the domain?

I thanks in advance for your precious answers.

Answered by fracape

May 23, 2022

Hi,
There are no stupid questions.

So first of all, we are doing two things when training: we train a model to optimally compress images in a differential manner, but we also want to generate fixed-point probability mass functions (or their corresponding CDF) that model well the distributions of the generated latent tensors from training samples. For the final (real-life) codec, they will be shared by the encoder and the decoder for bit accurate decoding of the bitstream at inference.

The tail_mass corresponds to the cumulative probabilities that we'll exclude form the "range" of our main arithmetic coder, split in a lower tail and upper tail of the distribution. You can check the range c…

View full answer

fracape · 2022-05-23T06:48:19Z

fracape
May 23, 2022
Maintainer

Hi,
There are no stupid questions.

So first of all, we are doing two things when training: we train a model to optimally compress images in a differential manner, but we also want to generate fixed-point probability mass functions (or their corresponding CDF) that model well the distributions of the generated latent tensors from training samples. For the final (real-life) codec, they will be shared by the encoder and the decoder for bit accurate decoding of the bitstream at inference.

The tail_mass corresponds to the cumulative probabilities that we'll exclude form the "range" of our main arithmetic coder, split in a lower tail and upper tail of the distribution. You can check the range coder interface, which shows that the values outside the range are bypass coded. This costs more bits, so you want to find a good tradeoff between a relevant range depending on the precision of your coder, and the probability of having to encode these less frequent values in a costlier manner. You can select the tail mass, which corresponds to the ratio of samples bypass (exp-golomb) coding. Then, the system derives the corresponding "quantiles", i.e. the 3 values (of latent/hyper latent samples): the lower limit of the range (defining the left tail), the median value and the upper limit (right tail) of each distribution. These values are used to compute the discrete CDF that will be used for actual compression after training.

The actual range coding at inference, using discrete CDFs and encoding of less probable 'tails' with exp-golomb (bypass) coding, is not considered in the main training loss, which relies on a differentiable approximation of the entropy by a small fully connected network (see annex 6.1/6.2 of the original paper).
The auxiliary loss is here to make sur your final CDF meet your tail_mass criterion, and derive the proper CDF tables/lengths. You can train this either during the main training or after. You finally need to update your model to compute and store the final quantized CDFs.

Hope this helps, but again, you can refer to the original paper(s) mentioned here and in the code for more explanations, as well as tensorflow compression.

2 replies

AlbertoPresta May 23, 2022
Author

Ok your answer is perfect, thanks!
One last thing: the reason why you subtract the means to the output in the quantization phase (not during training) is that you want to have the latent tensors "centered" in the medians, isn't it?

$@fracape$

fracape Jun 1, 2022
Maintainer

sorry, I missed your question, depending on the method, yes!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding probability distribution for entropy coding #139

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Understanding probability distribution for entropy coding #139

AlbertoPresta May 19, 2022

Replies: 1 comment · 2 replies

fracape May 23, 2022 Maintainer

AlbertoPresta May 23, 2022 Author

fracape Jun 1, 2022 Maintainer

AlbertoPresta
May 19, 2022

Replies: 1 comment 2 replies

fracape
May 23, 2022
Maintainer

AlbertoPresta May 23, 2022
Author

fracape Jun 1, 2022
Maintainer