About the y_likelihoods in Entropy models #314

chunbaobao · 2024-10-25T03:41:40Z

Hi, it's an excellent work. I have recently been working in the area of learned compression. I have already checked issue #306, but I am still confused about the actual meaning of y_likelihoods:

The shape of y_likelihoods is equal to the shape of y, and I thought each element of y_likelihoods represents the probability $p(y_i)$, where $p(y_i)$ is the probability that the symbol $y_i$ appears. However, when I run the code below from the demo,

torch.sum(out_net['likelihoods']['y']).item()/torch.prod(torch.tensor(out_net['likelihoods']['y'].shape)).item()
# result = 0.9271195729573568

it produces an output where the elements of y_likelihoods are close to 1 everywhere. Isn't this contradictory? Shouldn't the sum of y_likelihoods is 1? What is the actual meaning of y_likelihoods?

The calculation for bpp is $\sum -log_2(p(y_i))/N$, but the information entropy is actually $\sum - p(y_i)* log_2(p(y_i))$, I just wandering where did the $p(y_i)$ go? Is it because $\sum -log_2(p(y_i))/N$ is the upper bound of $\sum - p(y_i)*\log_2(p(y_i))$
from the inequality below?

The text was updated successfully, but these errors were encountered:

YodaEmbedding · 2024-10-28T06:35:13Z

For each $i$, there is a discrete probability distribution $p_i$, i.e.,

$$\int_{\mathbb{R}} p_i(t) \, dt = 1.$$

There are many such probability distributions -- one for each element in $\hat{y}$.

Each element $\hat{y}_i$ is encoded using its corresponding $p_i$. We can measure the "likelihood" $l_i$ for each element:

$$l_i = p_i(\hat{y}_i)$$

...and since the rate is the negative log-likelihood, the bit cost of the $i$-th element is:

$$R_i = -\log_2 l_i$$

The total rate cost is then:

$$R = \sum_i R_i$$

...which can be averaged over $N$ pixels to obtain the bpp.

For a single fixed encoding distribution $p$, the average rate cost for encoding a single symbol that is drawn from the same distribution $p$ is:

$$R = \sum_t - p(t) \, \log p(t)$$

But this is not what we're doing. What we're actually interested in is the cross-entropy. That is the average rate cost for encoding a single symbol drawn from the true distribution $\hat{p}$:

$$R = \sum_t - \hat{p}(t) \, \log p(t)$$

To be consistent with our notation above, we should also sprinkle in some $i$ s:

$$R_i = \sum_t - \hat{p}_i(t) \, \log p_i(t)$$

In our case, we know exactly what $\hat{p}$ is...

$$\hat{p}_i(t) = \delta[t - \hat{y}_i] = \begin{cases} 1 & \text{if } t = \hat{y}_i \\ 0 & \text{otherwise} \end{cases} $$

If we plug this into the earlier equation, the rate cost for encoding the $i$-th element becomes:

$$R_i = -\log p_i(\hat{y}_i)$$

chunbaobao · 2024-11-02T06:29:54Z

Thank you for your reply; it has been incredibly helpful! However, I still have several stupid questions:

In the update() function, we need to add pmf_start to sample, and the symbols need to subtract offset in the C++ interface to get $p(y_i - \text{offset}= t+\text{pmfstart}) $, where $t \in {0, 1, 2, \ldots, N} $. I’m just wondering why we can't simply set sample = torch.arange(max_length) and omit the pmf_start and offset steps to get $p(y_i = t) $, where $t \in {0, 1, 2, \ldots, N} $? Whats the meaning of pmf_start and offset here? Is it because the min of latent y is not zero?

CompressAI/compressai/entropy_models/entropy_models.py

Lines 419 to 420 in 743680b

    
           samples = torch.arange(max_length, device=device) 
        
           samples = samples[None, :] + pmf_start[:, None, None]

CompressAI/compressai/cpp_exts/rans/rans_interface.cpp

Line 149 in 743680b

int32_t value = symbols[i] - offsets[cdf_idx];

Similar to q1, the $p(y_i - \text{offset}= t+\text{pmfstart})$ (if it is true). Would it be more meaningful to express it as below?
prob = torch.cat((tail_mass[i]/2, p[: pmf_length[i]], tail_mass[i]/2), dim=0)

CompressAI/compressai/entropy_models/entropy_models.py

Lines 204 to 212 in 743680b

    
           def _pmf_to_cdf(self, pmf, tail_mass, pmf_length, max_length): 
        
               cdf = torch.zeros( 
        
                   (len(pmf_length), max_length + 2), dtype=torch.int32, device=pmf.device 
        
               ) 
        
               for i, p in enumerate(pmf): 
        
                   prob = torch.cat((p[: pmf_length[i]], tail_mass[i]), dim=0) 
        
                   _cdf = pmf_to_quantized_cdf(prob, self.entropy_coder_precision) 
        
                   cdf[i, : _cdf.size(0)] = _cdf 
        
               return cdf

YodaEmbedding · 2024-11-18T02:48:02Z

Most values of y are expected to occur between quantiles[0] and quantiles[2] with a total probability of 1 - tail_mass. That is, P(quantiles[0] ≤ y ≤ quantiles[2]) = 1 - tail_mass.

When converting these to a discrete distribution over the range [0, pmf_length], we:

Round quantiles[0] and quantiles[2] outwards (i.e floor and ceil).
Determine the lengths of the distributions (pmf_length) and the value that the 0th symbol represents (pmf_start).
Calculate pmf. We need to evaluate it at all the samples/points it represents. samples = torch.arange(max_length) + pmf_start[:, None]. (The code also includes an empty dimension in the middle for compatibility with _likelihood.)
Reevaluate the tail_mass (should be equal to the sum of our new pmf).
Finally, create the quantized_cdf that we will feed into the C++ entropy coder.

Related links:

P.S. In the EntropyBottleneck, we use a single distribution per channel, as visualized here in Figure 4a.

Values that are out of range (i.e. on the left of quantiles[0] or on the right of quantiles[2]) are encoded using the "bypass mode". This is done by encoding a max value symbol, then encoding the out-of-range value with the bypass method.

CompressAI/compressai/cpp_exts/rans/rans_interface.cpp

Lines 298 to 320 in 743680b

    
           if (value == max_value) { 
        
             /* Bypass decoding mode */ 
        
             int32_t val = Rans64DecGetBits(&rans, &ptr, bypass_precision); 
        
             int32_t n_bypass = val; 
        
             while (val == max_bypass_val) { 
        
               val = Rans64DecGetBits(&rans, &ptr, bypass_precision); 
        
               n_bypass += val; 
        
             } 
        
             int32_t raw_val = 0; 
        
             for (int j = 0; j < n_bypass; ++j) { 
        
               val = Rans64DecGetBits(&rans, &ptr, bypass_precision); 
        
               assert(val <= max_bypass_val); 
        
               raw_val |= val << (j * bypass_precision); 
        
             } 
        
             value = raw_val >> 1; 
        
             if (raw_val & 1) { 
        
               value = -value - 1; 
        
             } else { 
        
               value += max_value; 
        
             } 
        
           }

Since max value symbols (i.e. out-of-range values to the left of quantiles[0] or to the right of quantiles[2]) are expected to occur with probability of tail_mass, we use:

prob = torch.cat((p[: pmf_length[i]], tail_mass[i]), dim=0)

Notice that the extra out-of-range symbol actually makes the quantized pmf we use of size pmf_length + 1, and its quantized cdf of size pmf_length + 2.

chunbaobao · 2024-11-25T10:10:59Z

Got it! Thank you so much! Your answers really help a lot. Appreciate your sharing!

chunbaobao closed this as completed Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the y_likelihoods in Entropy models #314

About the y_likelihoods in Entropy models #314

chunbaobao commented Oct 25, 2024

YodaEmbedding commented Oct 28, 2024 •

edited

Loading

chunbaobao commented Nov 2, 2024

YodaEmbedding commented Nov 18, 2024 •

edited

Loading

chunbaobao commented Nov 25, 2024

About the y_likelihoods in Entropy models #314

About the y_likelihoods in Entropy models #314

Comments

chunbaobao commented Oct 25, 2024

YodaEmbedding commented Oct 28, 2024 • edited Loading

chunbaobao commented Nov 2, 2024

YodaEmbedding commented Nov 18, 2024 • edited Loading

chunbaobao commented Nov 25, 2024

YodaEmbedding commented Oct 28, 2024 •

edited

Loading

YodaEmbedding commented Nov 18, 2024 •

edited

Loading