peluz / VICTOR-dataset Public

Notifications You must be signed in to change notification settings
Fork 3
Star 16

Code used for the VICTOR dataset paper

16 stars 3 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
baseline_clf_themes.ipynb		baseline_clf_themes.ipynb
crf_experiments.ipynb		crf_experiments.ipynb
dataset_statistics.ipynb		dataset_statistics.ipynb
get_preds.py		get_preds.py
shallow_clf_docType.ipynb		shallow_clf_docType.ipynb
train_cnn.py		train_cnn.py
train_lstm.py		train_lstm.py
train_xgboost_themes.py		train_xgboost_themes.py

Repository files navigation

VICTOR: a Dataset for Brazilian Legal Documents Classification

This repo holds source code described in the paper below:

Pedro H. Luz de Araujo, Teófilo E. de Campos, Fabricio Ataides Braz, Nilton Correia da Silva VICTOR: a Dataset for Brazilian Legal Documents Classification
Language Resources and Evaluation Conference (LREC), May, Marseille, France, 2020.
Download: [ paper | bib ]

We kindly request that users cite our paper in any publication that is generated as a result of the use of our code or our dataset.

Requirements

Files

shallow_clf_docType.ipynb: notebook to train the shallow classifiers for document type prediction
baseline_clf_themes.ipynb: notebook to train baseline classifiers for theme prediction
dataset_statistics.ipynb: notebook to compute dataset statistics
get_preds.py: script to compute and save model predictions (to use in the CRF experiments)
crf_experiments.ipynb: notebook for CRF post-processing for document type classification
train_cnn.py script to train CNN for document type classification
train_lstm.py script to train LSTM for document type classification
train_xgboost_themes.py script to train XGBoost for theme classification

About

Code used for the VICTOR dataset paper

Report repository

Releases

No releases published

Packages

No packages published

Languages