experiments.qmd

# Experiments {#sec-experiments}

Experiments are the backbone of causal inference, and text analysis is no exception. Whether in a laboratory or on [Amazon's Mechanical Turk](https://www.mturk.com), experiments can be carefully controlled and are a good way to mitigate the effects of confounding variables. Though many people associate advanced natural language processing with "big data," the methods discussed in this book can be used effectively even in small-scale laboratory experiments.

**An example of using experiments in quantitative language research:** @sap_etal_2020 had online participants write either true stories that happened to them recently, or fictional stories about the same topic. They then used a large language model, GPT, to measure two likelihoods for each sentence in the story: the likelihood of the sentence given the previous sentence, and the likelihood of the sentence given a rough summary of the story. The ratio of these two likelihoods is a measure of how predictably the story flows from one point to another. @sap_etal_2020 found that fictional stories flow much more predictably than true ones. They also found that true stories begin to flow more predictably when they are retold 2-3 months later. @sap_etal_2022 reproduced these findings using a more advanced language model, GPT-3. We will discuss these and other methods of measuring linguistic complexity in @sec-linguistic-complexity.

::: {.callout-tip icon="false"}
## Advantages of Experimental Data Collection

-   **Control:** Experiments mitigate the effects of confounding variables.
-   **Customization:** Experimenters can tailor the experiment to fit their particular research questions.
:::

::: {.callout-important icon="false"}
## Disadvantages of Experimental Data Collection

-   **Expensive**
-   **Time-Consuming** 
-   **Small Sample Size:** Because they are costly and time-consuming, experiments generally result in small datasets.
:::

------------------------------------------------------------------------