Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomization inference, "ri" sampling_method in rwolf, gives too tight a sample of null t-statistics #717

Open
marcandre259 opened this issue Nov 15, 2024 · 3 comments

Comments

@marcandre259
Copy link
Contributor

Possible issue I noticed while working on #698.

The behavior was initially noticed when comparing "wild-bootstrap" to the "ri" sample_method p-values when the parameter of interest has no association with the outcome.

With the tight null t-distribution, the resulting p-value is too small.

To reproduce:

import pyfixest as pf
import numpy as np

import matplotlib.pyplot as plt

# Get data and randomize
data = pf.get_data()

np.random.default_rng(232)
data["X1"] = np.random.choice(data["X1"], size=data.shape[0], replace=False)

fit = pf.feols("Y ~ X1", data=data)

fit.summary()

Estimation: OLS
Dep. var.: Y, Fixed effects: 0
Inference: iid
Observations: 998

| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| Intercept | -0.160 | 0.119 | -1.344 | 0.179 | -0.394 | 0.074 |
| X1 | 0.033 | 0.090 | 0.367 | 0.714 | -0.144 | 0.211 |

RMSE: 2.304 R2: 0.0

seed = 111
df_wild, df_t_wild = fit.wildboottest(param="X1", reps=9999, return_bootstrapped_t_stats=True, seed=seed)

rng = np.random.default_rng(232)
fit.ritest(resampvar="X1", reps=9999, type="randomization-t", store_ritest_statistics=True, rng=rng)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
ax[0].hist(fit._ritest_statistics, label="RI t stats", alpha=0.4);
ax[0].axvline(x=fit._ritest_sample_stat, linestyle="--", label="Observed RI t stats", color="black")
ax[0].legend()
ax[1].hist(df_t_wild, label="Wild t stats", alpha=0.4, color="orange");
ax[1].axvline(df_wild["t value"], label="Observed Wild t stat", color="black", linestyle="--");
ax[1].legend()

comparing_null_t_empirical_distributions

@s3alfisc
Copy link
Member

Yes, this looks wrong! I'll take a look later. Thanks for reporting!

@s3alfisc
Copy link
Member

At second thought, this might not necessarily be a bug, for two reasons:

  • slightly different nulls: first, the randomization inference estimator tests a "sharp" null hypothesis of no effect for any individual, i.e. we test that $H0​:Yi​(1)=Yi​(0)$ for all i, which is slightly different from testing that the average treatment effect is zero (which is what we do when we run inference via the bootstrap).
  • different properties of the tests: it might be that the bootstrap is more conservative (or the ritest being less conservative), leading to different distributions

Will have to think about this more - took a look at the code & it looked mostly fine, though will have to check again. Width of the sampling interval differences looks indeed suspicious.

@marcandre259
Copy link
Contributor Author

marcandre259 commented Nov 16, 2024

Hi @s3alfisc,

Based on testing the sharp hypothesis with randomization inference, I would expect the boostrap approach to be the less conservative one then.

I'm quickly peeking in this paper that confirms this with simulations in table 1 (Fischer -> sharp, Neyman -> average afaik).

Namely, sharp null rejection implies average null rejection.

As far as progress on #698, I'll get back to including RI for Westfall-Young now that this is open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants