Skip to content

Commit

Permalink
[DOC] Add example usage to README.md (#26)
Browse files Browse the repository at this point in the history
Fixes #26 

---------

Signed-off-by: Patrick Bloebaum <[email protected]>
Co-authored-by: Adam Li <[email protected]>
  • Loading branch information
bloebp and adam2392 authored Oct 3, 2023
1 parent 026d71b commit db82c81
Showing 1 changed file with 133 additions and 2 deletions.
135 changes: 133 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@

# PyWhy-Stats

Pywhy-stats serves as Python library for implementations of various statistical methods, such as (un)conditional independence tests, which can be utilized in tasks like causal discovery.
Pywhy-stats serves as Python library for implementations of various statistical methods, such as (un)conditional independence tests, which can be utilized in tasks like causal discovery. In the current version, PyWhy-stats supports:
- Kernel-based independence and conditional k-sample tests
- FisherZ-based independence tests
- Power-divergence independence tests
- Bregman-divergence conditional k-sample tests

# Documentation

Expand Down Expand Up @@ -42,6 +46,133 @@ To install the package from github, clone the repository and then `cd` into the
# if you would like an editable install of pywhy-stats for dev purposes
pip install -e .

# Quick Start

In the following sections, we will use artificial exemplary data to demonstrate the API's functionality. More
information about the methods and hyperparameters can be found in the [documentation](https://py-why.github.io/pywhy-stats/stable/index.html).

Note that most methods in PyWhy-Stats support multivariate inputs. For this. simply pass in a
2D numpy array where rows represent samples and columns the different dimensions.

### Unconditional Independence Tests

Consider the following exemplary data:

```Python
import numpy as np

rng = np.random.default_rng(0)
X = rng.standard_normal((200, 1))
Y = np.exp(X + rng.standard_normal(size=(200, 1)))
```

Here, $Y$ depends on $X$ in a non-linear way. We can use the simplified API of PyWhy-Stats to test the null hypothesis
that the variables are independent:

```Python
from pywhy_stats import independence_test

result = independence_test(X, Y)
print("p-value:", result.pvalue, "Test statistic:", result.statistic)
```

The `independence_test` method returns an object containing a p-value, a test statistic, and possibly additional
information about the test. By default, this method employs a heuristic to select the most appropriate test for the
data. Currently, it defaults to a kernel-based independence test.

As we observed, the p-value is significantly small. Using, for example, a significance level of 0.05, we would reject
the null hypothesis of independence and infer that these variables are dependent. However, a p-value exceeding the
significance level doesn't conclusively indicate that the variables are independent, it only indicates insufficient
evidence of dependence.

We can also be more specific in the type of independence test we want to use. For instance, to use
a FisherZ test, we can indicate this by:

```Python
from pywhy_stats import Methods

result = independence_test(X, Y, method=Methods.FISHERZ)
print("p-value:", result.pvalue, "Test statistic:", result.statistic)
```

Or for the kernel based independence test:

```Python
from pywhy_stats import Methods

result = independence_test(X, Y, method=Methods.KCI)
print("p-value:", result.pvalue, "Test statistic:", result.statistic)
```

For more information about the available methods, hyperparameters and other details, see the
[documentation](https://py-why.github.io/pywhy-stats/stable/index.html).

### Conditional independence test

Similar to the unconditional independence test, we can use the same API to condition on another variable or set of
variables. First, let's generate a third variable $Z$ to condition on:

```
import numpy as np
rng = np.random.default_rng(0)
Z = rng.standard_normal((200, 1))
X = Z + rng.standard_normal(size=(200, 1))
Y = np.exp(Z + rng.standard_normal(size=(200, 1)))
```

Here, $X$ and $Y$ are dependent due to $Z$. Running an unconditional independence test yields:

```Python
from pywhy_stats import independence_test

result = independence_test(X, Y)
print("p-value:", result.pvalue, "Test statistic:", result.statistic)
```

Again, the p-value is very small, indicating a high likelihood that $X$ and $Y$ are dependent. Now,
let's condition on $Z$, which should render the variables as independent:

```Python
result = independence_test(X, Y, condition_on=Z)
print("p-value:", result.pvalue, "Test statistic:", result.statistic)
```

We observe that the p-value isn't small anymore. Indeed, if the variables were independent, we would expect the p-value
to be uniformly distributed on $[0, 1]$.

### (Conditional) k-sample test

In certain settings, you may be interested in testing the invariance between k (conditional) distributions. For example, say you have data collected over the same set of variables (X, Y) from humans ($P^1(X, Y)$) and bonobos ($P^2(X, Y)$). You can determine if the conditional distributions $P^1(Y | X) = P^2(Y | X)$ using conditional two-sample test.

First, we can create some simulated data that arise from two distinct distributions. However, the data generating Y is invariant across these two settings once we condition on X.

```Python
import numpy as np

rng = np.random.default_rng(0)
X1 = rng.standard_normal((200, 1))
X2 = rng.uniform(low=0.0, high=1.0, size=(200, 1))

Y1 = np.exp(X1 + rng.standard_normal(size=(200, 1)))
Y2 = np.exp(X2 + rng.standard_normal(size=(200, 1)))

groups = np.concatenate((np.zeros((200, 1)), np.ones((200, 1))))
X = np.concatenate((X1, X2))
Y = np.concatenate((Y1, Y2))
```

We test the hypothesis that $P^1(Y | X) = P^2(Y | X)$ now with the following code.

```Python
from pywhy_stats import conditional_ksample

# test that P^1(Y | X) = P^2(Y | X)
result = conditional_ksample.kcd.condind(X, Y, groups)

print("p-value:", result.pvalue, "Test statistic:", result.statistic)
```

# Contributing

We welcome contributions from the community. Please refer to our [contributing document](./CONTRIBUTING.md) and [developer document](./DEVELOPING.md) for information on developer workflows.
We welcome contributions from the community. Please refer to our [contributing document](./CONTRIBUTING.md) and [developer document](./DEVELOPING.md) for information on developer workflows.

0 comments on commit db82c81

Please sign in to comment.