Skip to content

Releases: x-tabdeveloping/turftopic

v0.11.0

08 Jan 15:39
6a02107
Compare
Choose a tag to compare

New in version 0.11.0: Vectorizers Module

You can now use a set of custom vectorizers for topic modeling over phrases, as well as lemmata and stems.

from turftopic import KeyNMF
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer

model = KeyNMF(
    n_components=10,
    vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
)
model.fit(corpus)
model.print_topics()
Topic ID Highest Ranking
...
3 fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism
4 religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index
...

Turftopic now also comes with a Chinese vectorizer for easier use, as well as a generalist multilingual vectorizer.

from turftopic.vectorizers.chinese import default_chinese_vectorizer
from turftopic.vectorizers.spacy import TokenCountVectorizer

chinese_vectorizer = default_chinese_vectorizer()
arabic_vectorizer = TokenCountVectorizer("ar", remove_stopwords=True)
danish_vectorizer = TokenCountVectorizer("da", remove_stopwords=True)
...

v0.8.0

05 Nov 07:34
Compare
Choose a tag to compare

Automated Topic Naming

Turftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval!

from turftopic import KeyNMF
from turftopic.namers import OpenAITopicNamer

model = KeyNMF(10).fit(corpus)

namer = OpenAITopicNamer("gpt-4o-mini")
model.rename_topics(namer)
model.print_topics()
Topic ID Topic Name Highest Ranking
0 Operating Systems and Software windows, dos, os, ms, microsoft, unix, nt, memory, program, apps
1 Atheism and Belief Systems atheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith
2 Computer Architecture and Performance motherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance
3 Storage Technologies disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot
...

v0.7.0

23 Oct 13:16
Compare
Choose a tag to compare

New in version 0.7.0

Component re-estimation, refitting and topic merging

Some models can now easily be modified after being trained in an efficient manner,
without having to recompute all attributes from scratch.
This is especially significant for clustering models and $S^3$.

from turftopic import SemanticSignalSeparation, ClusteringTopicModel

s3_model = SemanticSignalSeparation(5, feature_importance="combined").fit(corpus)
# Re-estimating term importances
s3_model.estimate_components(feature_importance="angular")
# Refitting S^3 with a different number of topics (very fast)
s3_model.refit(n_components=10, random_seed=42)

clustering_model = ClusteringTopicModel().fit(corpus)
# Reduces number of topics automatically with a given method
clustering_model.reduce_topics(n_reduce_to=20, reduction_method="smallest")
# Merge topics manually
clustering_model.join_topics([0,3,4,5])
# Resets original topics
clustering_model.reset_topics()
# Re-estimates term importances based on a different method
clustering_model.estimate_components(feature_importance="centroid")

Manual topic naming

You can now manually label topics in all models in Turftopic.

# you can specify a dict mapping IDs to names
model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"})
# or a list of topic names
model.rename_topics([f"Topic {i}" for i in range(10)])

Saving, loading and publishing to HF Hub

You can now load, save and publish models with dedicated functionality.

from turftopic import load_model

model.to_disk("out_folder/")
model = load_model("out_folder/")

model.push_to_hub("your_user/model_name")
model = load_model("your_user/model_name")

v0.4.0

25 Jun 11:04
1dbf359
Compare
Choose a tag to compare

Release Highlights:

1. Online KeyNMF

KeyNMF can now be fitted in an online fashion in batches:

from itertools import batched
from turftopic import KeyNMF

model = KeyNMF(10, top_n=5)

corpus = ["some string", "etc", ...]
for batch in batched(corpus, 200):
    batch = list(batch)
    model.partial_fit(batch)

2. Precompute keyword matrices in KeyNMF

You can precompute the keyword matrix of KeyNMF models and then use them in training.

model.extract_keywords(["Cars are perhaps the most important invention of the last couple of centuries. They have revolutionized transportation in many ways."])
[{'transportation': 0.44713873,
  'invention': 0.560524,
  'cars': 0.5046208,
  'revolutionized': 0.3339205,
  'important': 0.21803442}]
keyword_matrix = model.extract_keywords(corpus)
model.fit(keywords=keyword_matrix)

3. Concept Compass in $S^3$

You can now produce a concept compass figure with $S^3$ similar to that in the paper:

from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(10).fit(corpus)

# You will need to `pip install plotly` before this.
fig = model.concept_compass(topic_x=1, topic_y=4)
fig.show()

4. Bugfixes in Dynamic Modeling

Binning is now fixed in dynamic modeling and will create the appropriate number of time slices when asked to. The first time slice is not left out either.

v0.3.0

10 Jun 08:09
7117f23
Compare
Choose a tag to compare

Highlight: Dynamic KeyNMF

From version 0.3.0 you can use KeyNMF for dynamic topic modeling:

from datetime import datetime
from turftopic import KeyNMF

corpus: list[str] = [...]
timestamps = list[datetime] = [...]

model = KeyNMF(10)
doc_topic_matrix = model.fit_transform_dynamic(corpus, timestamps=timestamps, bins=10)

model.print_topics_over_time()

# This needs Plotly: pip install plotly
model.plot_topics_over_time()