Skip to content

v0.11.0

Latest
Compare
Choose a tag to compare
@x-tabdeveloping x-tabdeveloping released this 08 Jan 15:39
· 1 commit to main since this release
6a02107

New in version 0.11.0: Vectorizers Module

You can now use a set of custom vectorizers for topic modeling over phrases, as well as lemmata and stems.

from turftopic import KeyNMF
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer

model = KeyNMF(
    n_components=10,
    vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
)
model.fit(corpus)
model.print_topics()
Topic ID Highest Ranking
...
3 fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism
4 religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index
...

Turftopic now also comes with a Chinese vectorizer for easier use, as well as a generalist multilingual vectorizer.

from turftopic.vectorizers.chinese import default_chinese_vectorizer
from turftopic.vectorizers.spacy import TokenCountVectorizer

chinese_vectorizer = default_chinese_vectorizer()
arabic_vectorizer = TokenCountVectorizer("ar", remove_stopwords=True)
danish_vectorizer = TokenCountVectorizer("da", remove_stopwords=True)
...