Error using CustomDist with observed values and multiple shapes #6916

kbruegge · 2023-09-18T08:24:36Z

kbruegge
Sep 18, 2023

Hi there and thanks for all your work,

We are trying to build a poor mans version of a generalized/poisson binomial distribution using the pm.CustomDict. Heres a minimal (not-)working example.

data_dims = (10, 16, 16 )
random_image = lambda : np.random.choice([0, 1], size=data_dims)
observed_data = ((random_image() - random_image()) ** 2).sum(axis=(1, 2))

def custom_dist(
    p: TensorVariable,
    size: TensorVariable,
):
    return  pytensor.tensor.sum(pm.Bernoulli.dist(p=p), axis=[1, 2])


with pm.Model() as model:
    px = pm.Beta("px", alpha=1, beta=1, shape=data_dims)
    py = pm.Beta("py", alpha=1, beta=1, shape=data_dims)
    p =  pm.Deterministic("p", px*(1-py) + (1-px)*py)
    r = pm.CustomDist("c", p, dist=custom_dist, observed=observed_data)
    pm.sample()

I'm sure there are better ways to model this but in any case we are trying how the basics of pymc work for now. When sampling this model we get an error:

RuntimeError: The logprob terms of the following value variables could not be derived: {TensorConstant(TensorType(int64, shape=(10,)), data=array([115 ... 118, 120]))}

I'm confused as to why it needs to compute the logp of an observed quantity? Or is something else going on here.

We're on version 5.8.1. Any help is appreciated!

Answered by ricardoV94

Nov 14, 2023

To be clear, PyMC relies on random methods (either given by the dist or random kwargs) for prior and posterior predictive sampling. For posterior sampling, which uses MCMC, PyMC needs to know the log density (or log pmf) of the variables (given by the logp kwarg, or sometimes inferred from the dist kwarg).

View full answer

ricardoV94 · 2023-09-18T11:10:06Z

ricardoV94
Sep 18, 2023
Maintainer

PyMC needs to now the logp of an observed quantity as well (it's the likelihood)! In some cases CustomDist can infer the logprob from the graph returned by the dist function, but not in this case, because the probability would involve a convolution (over all the possible Bernoulli values whose sum matches the observations).

1 reply

kbruegge Sep 25, 2023
Author

Hey there! Thanks for your response. You're right of course and I probably misunderstood the error message. Because to me it looks like its trying to find the logp of the vector of (constant) observation vector itself. Calling sample_prior_predictive without passing in any observations works as well.

We'll try To implement the logp method and let you know if we can get it working :)

kbruegge · 2023-11-14T11:08:03Z

kbruegge
Nov 14, 2023
Author

Hey there,
sorry for taking so long to get back to this. We are still struggling to understand how to use the CustomDist in our case. This is probably due to our ignorance on some very basic level.

We are trying to use pymc to sample (and later fit) some "complicated" distribution. In our case we are looking at the sum of independently distributed Bernouli RVs. We know (from wikipedia) that this corresponds to a "Poisson binomial distribution".

We were hoping that pymc would help us sample/fit that distribution/likelihood without needing to implement the details of the PDF (this is probably were we are wrong)

Here is another (more minimal) example:

import pymc as pm
import pytensor.tensor as pt


def custom_logp(value, p):
    # do we need to do the math by hand and enter implement the full logp in terms of simpler pytensor operations?
    # or is there a way to tell pymc that we are looking at the convolution of multiple RVs more in a more explicit manner?
    #
    # If we wanted to sample from that custom distribution, would we then need to implement the details of a random generator for that specific distribution? Then what to we gain by the MCMC apporach? 
    #
    # return pm.logp(pm.Bernoulli.dist(p=p).sum(), value) raises NotImplementedError: pm.logp is not implemented for "sum" 
    # return pt.log(pm.Bernoulli.dist(p=p).sum()) # "value" is not used here. Also raises ValuError: Random variables detected in the logp graph: {bernoulli_rv{0, (0,), int64, False}.out}.
    pass

with pm.Model() as model:
    p = pm.Beta("px", alpha=1, beta=1, shape=8)
    # at this point 'p' is a tensor with 8 values or a variable with shape 8 more acuratelly I think 
    cd = pm.CustomDist("cd", p, logp=custom_logp)

Thanks for your patience and your help.

3 replies

ricardoV94 Nov 14, 2023
Maintainer

You are right, PyMC CustomDist cannot guess the convolution probability for you. You have to define a custom_logp method using pytensor operations.

For the random method you can either define a random function that uses Numpy objects or a dist function that uses PyMC/PyTensor objects. There are examples of both in the docstrings of CustomDist: https://www.pymc.io/projects/docs/en/latest/api/distributions/generated/pymc.CustomDist.html

Sometimes, when you provide a dist function PyMC can guess the pdf/cdf associated with that variable, but that functionality is restricted to a few cases that have a nice closed form solution. It won't be the case for you.

ricardoV94 Nov 14, 2023
Maintainer

To be clear, PyMC relies on random methods (either given by the dist or random kwargs) for prior and posterior predictive sampling. For posterior sampling, which uses MCMC, PyMC needs to know the log density (or log pmf) of the variables (given by the logp kwarg, or sometimes inferred from the dist kwarg).

Answer selected by kbruegge

kbruegge Nov 14, 2023
Author

Thanks! I'll mark this as answered. I guess it would be an interesting challenge to implement this. I guess you'd need some fancy FFT stuff and work with complex numbers types with pytensor. For our case we can find a different way to model this. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using CustomDist with observed values and multiple shapes #6916

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Error using CustomDist with observed values and multiple shapes #6916

kbruegge Sep 18, 2023

Replies: 2 comments · 4 replies

ricardoV94 Sep 18, 2023 Maintainer

kbruegge Sep 25, 2023 Author

kbruegge Nov 14, 2023 Author

ricardoV94 Nov 14, 2023 Maintainer

ricardoV94 Nov 14, 2023 Maintainer

kbruegge Nov 14, 2023 Author

kbruegge
Sep 18, 2023

Replies: 2 comments 4 replies

ricardoV94
Sep 18, 2023
Maintainer

kbruegge Sep 25, 2023
Author

kbruegge
Nov 14, 2023
Author

ricardoV94 Nov 14, 2023
Maintainer

ricardoV94 Nov 14, 2023
Maintainer

kbruegge Nov 14, 2023
Author