POS Tagger for Bangla language based on Conditional Random Fields
-
Install the module
python setup.py install
-
Code
import bangla_pos_tagger btagger=bangla_pos_tagger.BanglaTagger()
#Query is an array of Bangla words btagger.pos_tag(query)
#term is a single Bengali Term btagger.get_tag(term)
where query is a tokenized words for a given Bangla Sentence.
- Unigram Based Tagger gives approximately 60-65% accuracy.
- Adding Bigram, and Trigram based taggers following the same increases the accuracy to some extent.
- Adding an affix based tagger, improves the accuracy a bit.
Note: In the "accuracy.txt" file in the analyzed_data directory. Only the relevant results have been added which were giving really good accuracies. The analysis is similar to that of the blog.