This repo holds source code described in the paper below:
- Pedro H. Luz de Araujo, Teófilo E. de Campos, Fabricio Ataides Braz, Nilton Correia da Silva
VICTOR: a Dataset for Brazilian Legal Documents Classification
Language Resources and Evaluation Conference (LREC), May, Marseille, France, 2020.
Download: [ paper | bib ]
We kindly request that users cite our paper in any publication that is generated as a result of the use of our code or our dataset.
- shallow_clf_docType.ipynb: notebook to train the shallow classifiers for document type prediction
- baseline_clf_themes.ipynb: notebook to train baseline classifiers for theme prediction
- dataset_statistics.ipynb: notebook to compute dataset statistics
- get_preds.py: script to compute and save model predictions (to use in the CRF experiments)
- crf_experiments.ipynb: notebook for CRF post-processing for document type classification
- train_cnn.py script to train CNN for document type classification
- train_lstm.py script to train LSTM for document type classification
- train_xgboost_themes.py script to train XGBoost for theme classification