-
Notifications
You must be signed in to change notification settings - Fork 0
/
lda.qmd
77 lines (50 loc) · 1.56 KB
/
lda.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Topic Modeling {#sec-lda}
```{r setup}
#| echo: false
#| include: false
source("_common.R")
library(quanteda)
hippocorpus_corp <- read_csv("data/hippocorpus-u20220112/hcV3-stories.csv") |>
select(AssignmentId, story, memType, summary, WorkerId,
annotatorGender, openness, timeSinceEvent) |>
corpus(docid_field = "AssignmentId",
text_field = "story")
hippocorpus_dfm <- hippocorpus_corp |>
tokens(remove_punct = TRUE) |>
dfm() |>
dfm_remove("~")
```
::: callout-important
## This page is still under construction. Come back soon!
:::
**Topic Modeling**
LDA [@blei_etal_2003]
Dirichlet is [generallly pronounced either "Deereekleh" or "Deerishleh"](https://german.stackexchange.com/questions/48498/pronunciation-of-dirichlet)
@poldrack_etal_2012
```{r}
#| eval: false
library(seededlda)
lda <- textmodel_lda(dfm, k = 10, verbose = TRUE)
```
For larger corpora, set `batch_size` lower
https://psycnet.apa.org/record/2021-27454-001
## Supervised LDA {#sec-slda}
@blei_mcauliffe_2010
[sLDA in R](https://books.psychstat.org/textmining/topic-models.html#supervised-topic-modeling)
## Semi-Supervised LDA {#sec-seededlda}
[seededLDA in R](https://koheiw.github.io/seededlda/)
**An Example of Semi-Supervised LDA in Research:** @curini_valerio_2021
## BERTopic: Neural Topic Modeling
@grootendorst_2022
```{r}
#| eval: false
devtools::install_github("abresler/bertopic")
```
::: {.callout-tip icon="false"}
## Advantages of Topic Modeling
- **:**
:::
::: {.callout-important icon="false"}
## Disadvantages of Topic Modeling
- **:**
:::