BERT Token Tagger for Persian Kasr-e Ezafeh

Each persion word in a sentance either has a Kasr-e Eafeh or it does not. Kas-e Ezafeh comes in 3 different forms as shown bellow:

@e
e
ve
y
ye

Using a BERT token classification model from pretrained HooshvareLab/distilbert-fa-zwnj-base enables us to train the model on this specific task. The dataset could not be published due to copyright but this is a sample of what it looks like:

سلسله	ye
ایرانی	O
در	O
سوادکوه	e
مازندران	O
حکومت	O
می‌کردند	O
.	O
نفوذ	e
برامکه	O
)	O
وزرای	ye
ایرانی	O
(	O
در	O
دربار	e
عباسی	O

Each token of the data has a label indicating its Kasr-e Ezafeh. Tokens without a tag are labeled as O .

Training

The model is trained using the Huggingface library with a pretrained bert model from HooshvareLab/distilbert-fa-zwnj-base .
After preprocessing the data and tokenizing, it is fed into the model. The model is trained in 3 epochs with a batch size of 16.

Results

The results during the training phase after each epoch are shown bellow:

Epoch	F1	Accuracy
1	0.930487	0.985994
2	0.939853	0.987362
3	0.940642	0.987561

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
BERT_Kasr_e_Ezafeh_Tagger.ipynb		BERT_Kasr_e_Ezafeh_Tagger.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT Token Tagger for Persian Kasr-e Ezafeh

Training

Results

About

Releases

Packages

Languages

License

mahvash-siavashpour/BERT-Token-Classification-for-Persian-Kasr-e-Ezafeh

Folders and files

Latest commit

History

Repository files navigation

BERT Token Tagger for Persian Kasr-e Ezafeh

Training

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages