ENH: Nested rhat MCMC diagnostic #752

gil2rok · 2024-10-30T19:54:14Z

Implement nested R-hat for Markov chain Monte Carlo (MCMC) diagnostic.

The potential scale reduction factor, also known as R-hat, is a popular MCMC diagnostic from Gelman and Rubin.
R-hat detects convergence of MCMC chains by comparing within chain variance to between chain variance.

Nested r-hat from Margossian et al. better predicts convergence when running thousands of short chains on modern hardware. Nested r-hat uses superchains, collections of MCMC chains, and compares within and between chain and superchain variance.

I am seeking feedback on the code style + API design. The code is somewhat complicated by requiring input_array to have 4 dimensions -- num_superchains, num_chains, num_samples, and num_params -- where most users may expect only 3 (or 2). Tests are also still needed, as well as a brief doc explanation of the math.

Quick nit: why does the existing R-hat function return the potential scale factor after flattening along the sample and chain dimensions? I followed this convention for my implementation.

Addresses issue #278 .

gil2rok · 2024-10-30T19:56:43Z

@charlesm93: you may be interested in this.

(For everyone else: Charles is the author of the nested R-hat paper.)

gil2rok · 2024-10-30T20:24:07Z

Code style tests are failing because Flake8 is finding extra spaces around operators on line 43 of smc/resampling:

However I do not see what the problem is. Here's the line:

blackjax/blackjax/smc/resampling.py

Line 43 in 65ae00e

"""

If anyone sees how to fix this issue, please let me know.

junpenglao

Thanks! Great start. Let me know when you add some test.

junpenglao · 2024-10-31T05:15:03Z

blackjax/diagnostics.py

+    NDArray of the resulting statistics (r-hat), with the chain and sample dimensions squeezed.
+
+    """
+    assert input_array.ndim == 4, "The input array must have 4 dimensions."


You should relax the ndim, as our input could have multiple dimensions of event shape (ie the random variable is non-scaler).

Use keepdims=True and it should works.

junpenglao · 2024-10-31T05:21:18Z

Quick nit: why does the existing R-hat function return the potential scale factor after flattening along the sample and chain dimensions? I followed this convention for my implementation.

The current r-hat function does not flatten the sample, but rather squeeze it, so if you have a random variable with shape=(2, 5), the output result could be
shape=(1, 1, 2, 5) or (1, 2, 5, 1)
doing a squzze makes it return rhat the same shape as the random variable.

gil2rok · 2024-10-31T17:34:14Z

Thanks so much for the quick feedback! I'll continue working on this next week.

gil2rok added 2 commits October 30, 2024 15:38

Nested rhat

a103796

Update __init__.py

6f13a3a

Formatting

15b2b88

junpenglao reviewed Oct 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Nested rhat MCMC diagnostic #752

ENH: Nested rhat MCMC diagnostic #752

gil2rok commented Oct 30, 2024 •

edited

Loading

gil2rok commented Oct 30, 2024

gil2rok commented Oct 30, 2024

junpenglao left a comment

junpenglao Oct 31, 2024

junpenglao Oct 31, 2024

junpenglao commented Oct 31, 2024

gil2rok commented Oct 31, 2024

ENH: Nested rhat MCMC diagnostic #752

Are you sure you want to change the base?

ENH: Nested rhat MCMC diagnostic #752

Conversation

gil2rok commented Oct 30, 2024 • edited Loading

gil2rok commented Oct 30, 2024

gil2rok commented Oct 30, 2024

junpenglao left a comment

Choose a reason for hiding this comment

junpenglao Oct 31, 2024

Choose a reason for hiding this comment

junpenglao Oct 31, 2024

Choose a reason for hiding this comment

junpenglao commented Oct 31, 2024

gil2rok commented Oct 31, 2024

gil2rok commented Oct 30, 2024 •

edited

Loading