Skip to content

peluz/VICTOR-dataset

Repository files navigation

VICTOR: a Dataset for Brazilian Legal Documents Classification

This repo holds source code described in the paper below:

We kindly request that users cite our paper in any publication that is generated as a result of the use of our code or our dataset.

Requirements

Files

  • shallow_clf_docType.ipynb: notebook to train the shallow classifiers for document type prediction
  • baseline_clf_themes.ipynb: notebook to train baseline classifiers for theme prediction
  • dataset_statistics.ipynb: notebook to compute dataset statistics
  • get_preds.py: script to compute and save model predictions (to use in the CRF experiments)
  • crf_experiments.ipynb: notebook for CRF post-processing for document type classification
  • train_cnn.py script to train CNN for document type classification
  • train_lstm.py script to train LSTM for document type classification
  • train_xgboost_themes.py script to train XGBoost for theme classification

About

Code used for the VICTOR dataset paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published