Glint LDA

Scalable Distributed LDA implementation for Spark & Glint

This implementation is based on LightLDA.

Usage

Make sure you have Glint running, for a simple localhost test with 2 servers, you can locally run Glint as follows:

sbt "run master"
sbt "run server"
sbt "run server"

Next, load in a dataset in Spark with an RDD:

// Preprocessing of data ...
// End result should be an RDD of breeze sparse vectors that represent bag-of-words term frequency vectors
rdd = sc.textFile(...).map(x => SparseVector[Int](...))

Construct the Glint client that acts as an interface to the running parameter servers

// Open glint client with a path to a specific configuration file
val gc = Client(ConfigFactory.parseFile(new java.io.File(configFile)))

Set the LDA parameters and call the fitMetropolisHastings function to run the LDA algorithm

// LDA topic model with 100,000 terms and 100 topics
val ldaConfig = new LDAConfig()
ldaConfig.setα(0.5)
ldaConfig.setβ(0.01)
ldaConfig.setTopics(100)
ldaConfig.setVocabularyTerms(100000)
val model = Solver.fitMetropolisHastings(sc, gc, rdd, ldaConfig, 100)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
project		project
src/main/scala/glintlda		src/main/scala/glintlda
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glint LDA

Usage

About

Releases

Packages

Languages

License

rjagerman/glintlda

Folders and files

Latest commit

History

Repository files navigation

Glint LDA

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages