[DOC] Add example usage to README.md (#26)

Fixes #26 --------- Signed-off-by: Patrick Bloebaum <[email protected]> Co-authored-by: Adam Li <[email protected]>
py-why · Oct 3, 2023 · db82c81 · db82c81
1 parent 026d71b
commit db82c81
Showing 1 changed file with 133 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,11 @@
 
 # PyWhy-Stats
 
-Pywhy-stats serves as Python library for implementations of various statistical methods, such as (un)conditional independence tests, which can be utilized in tasks like causal discovery.
+Pywhy-stats serves as Python library for implementations of various statistical methods, such as (un)conditional independence tests, which can be utilized in tasks like causal discovery. In the current version, PyWhy-stats supports:
+- Kernel-based independence and conditional k-sample tests
+- FisherZ-based independence tests
+- Power-divergence independence tests
+- Bregman-divergence conditional k-sample tests
 
 # Documentation
 
@@ -42,6 +46,133 @@ To install the package from github, clone the repository and then `cd` into the
     # if you would like an editable install of pywhy-stats for dev purposes
     pip install -e .
 
+# Quick Start
+
+In the following sections, we will use artificial exemplary data to demonstrate the API's functionality. More
+information about the methods and hyperparameters can be found in the [documentation](https://py-why.github.io/pywhy-stats/stable/index.html).
+
+Note that most methods in PyWhy-Stats support multivariate inputs. For this. simply pass in a
+2D numpy array where rows represent samples and columns the different dimensions.
+
+### Unconditional Independence Tests
+
+Consider the following exemplary data:
+
+```Python
+import numpy as np
+
+rng = np.random.default_rng(0)
+X = rng.standard_normal((200, 1))
+Y = np.exp(X + rng.standard_normal(size=(200, 1)))
+```
+
+Here, $Y$ depends on $X$ in a non-linear way. We can use the simplified API of PyWhy-Stats to test the null hypothesis
+that the variables are independent:
+
+```Python
+from pywhy_stats import independence_test
+
+result = independence_test(X, Y)
+print("p-value:", result.pvalue, "Test statistic:", result.statistic)
+```
+
+The `independence_test` method returns an object containing a p-value, a test statistic, and possibly additional
+information about the test. By default, this method employs a heuristic to select the most appropriate test for the
+data. Currently, it defaults to a kernel-based independence test.
+
+As we observed, the p-value is significantly small. Using, for example, a significance level of 0.05, we would reject
+the null hypothesis of independence and infer that these variables are dependent. However, a p-value exceeding the
+significance level doesn't conclusively indicate that the variables are independent, it only indicates insufficient
+evidence of dependence.
+
+We can also be more specific in the type of independence test we want to use. For instance, to use
+a FisherZ test, we can indicate this by:
+
+```Python
+from pywhy_stats import Methods
+
+result = independence_test(X, Y, method=Methods.FISHERZ)
+print("p-value:", result.pvalue, "Test statistic:", result.statistic)
+```
+
+Or for the kernel based independence test:
+
+```Python
+from pywhy_stats import Methods
+
+result = independence_test(X, Y, method=Methods.KCI)
+print("p-value:", result.pvalue, "Test statistic:", result.statistic)
+```
+
+For more information about the available methods, hyperparameters and other details, see the
+[documentation](https://py-why.github.io/pywhy-stats/stable/index.html).
+
+### Conditional independence test
+
+Similar to the unconditional independence test, we can use the same API to condition on another variable or set of
+variables. First, let's generate a third variable $Z$ to condition on:
+
+```
+import numpy as np
+  
+rng = np.random.default_rng(0)
+Z = rng.standard_normal((200, 1))
+X = Z + rng.standard_normal(size=(200, 1))
+Y = np.exp(Z + rng.standard_normal(size=(200, 1)))
+```
+
+Here, $X$ and $Y$ are dependent due to $Z$. Running an unconditional independence test yields:
+
+```Python
+from pywhy_stats import independence_test
+
+result = independence_test(X, Y)
+print("p-value:", result.pvalue, "Test statistic:", result.statistic)
+```
+
+Again, the p-value is very small, indicating a high likelihood that $X$ and $Y$ are dependent. Now,
+let's condition on $Z$, which should render the variables as independent:
+
+```Python
+result = independence_test(X, Y, condition_on=Z)
+print("p-value:", result.pvalue, "Test statistic:", result.statistic)
+```
+
+We observe that the p-value isn't small anymore. Indeed, if the variables were independent, we would expect the p-value
+to be uniformly distributed on $[0, 1]$.
+
+### (Conditional) k-sample test
+
+In certain settings, you may be interested in testing the invariance between k (conditional) distributions. For example, say you have data collected over the same set of variables (X, Y) from humans ($P^1(X, Y)$) and bonobos ($P^2(X, Y)$). You can determine if the conditional distributions $P^1(Y | X) = P^2(Y | X)$ using conditional two-sample test.
+
+First, we can create some simulated data that arise from two distinct distributions. However, the data generating Y is invariant across these two settings once we condition on X.
+
+```Python
+import numpy as np
+
+rng = np.random.default_rng(0)
+X1 = rng.standard_normal((200, 1))
+X2 = rng.uniform(low=0.0, high=1.0, size=(200, 1))
+
+Y1 = np.exp(X1 + rng.standard_normal(size=(200, 1)))
+Y2 = np.exp(X2 + rng.standard_normal(size=(200, 1)))
+
+groups = np.concatenate((np.zeros((200, 1)), np.ones((200, 1))))
+X = np.concatenate((X1, X2))
+Y = np.concatenate((Y1, Y2))
+```
+
+We test the hypothesis that $P^1(Y | X) = P^2(Y | X)$ now with the following code.
+
+```Python
+from pywhy_stats import conditional_ksample
+
+# test that P^1(Y | X) = P^2(Y | X)
+result = conditional_ksample.kcd.condind(X, Y, groups)
+
+print("p-value:", result.pvalue, "Test statistic:", result.statistic)
+```
+
 # Contributing
 
-We welcome contributions from the community. Please refer to our [contributing document](./CONTRIBUTING.md) and [developer document](./DEVELOPING.md) for information on developer workflows.
+We welcome contributions from the community. Please refer to our [contributing document](./CONTRIBUTING.md) and [developer document](./DEVELOPING.md) for information on developer workflows.