GitHub - SaadTazroute/JurisprudencesClassification: Classification des jurisprudences

Jurisprudences Classification

This project consists in carrying out a classification of the decisions of the French Court of Cassation carried out within the framework of a research internship, you'll also find the paper I was able to write during this period of lock down and internship.

Abstract

In this article, we examine the application of text classification approaches to assist legal professionals. We present several experiments applying classification techniques to predict the French Court of Cassation ruling and the area of law to which a case belongs.

We work on the CASS data set, composed of the judgments of the French Court of Cassation, containing the body of case law, and the ruling decision and others informations. We use two methods for classification. The first one is based on the TF-IDF score for feature extraction from the law text, we used these features as entries to several machine learning algorithms. The second method consists of using a pre-trained model to extract features named BERT based on Transformers architecture, and specially its French version named CamemBERT. We used especially in this paper the stacking of this architecture and a classification layer to realise the classification task.

Due to resources limitations, we trained the CamemBERT model only on 10 000 case law. We report results of 84% F-1 score in predicting a case ruling using CamemBERTforSequenceClassification , 89% F-1 score using the Tf-IDF score and a linear Support Vector Machine (SVM) classifier trained on lexical features.

The data used is available in this link : LegiFrance

The Colab link is on the top of each notebook. You will find 4 notebooks in this repository, you can run it on your local machine, or on colab:

00 : Creation of the file which gathers all the cases. 
01 : Deep Analysis of the french Cass Dataset 
02 : Classifier : TF-IDF + ML Algorithms
03 : Classifier : CamemBERTforSequenceClassification

How to use it :

Download the data from this link
Execute 00 locally using Jupyter notebook, it will take as input the folder downloaded, and allows you to create a .csv file containing all the case law.
Upload the csv file (named in the notebooks freemium02) on google drive to mount it into drive easily.
execute the 01, 02 , 03 in the order you like.

Methodology :

Gathering dataset from the French Court of Cassation's database

Use 2 different approachs to manipulate this data.

The first one is the bag of words paradigm + A TF-IDF scoring + Machine Learning algorithm

The second one is CamemBERT for text classification, which is a pretrained neural network of enormous french text, stacked with a layer for classification.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
00 Creating Jurisprudence Dataset.ipynb		00 Creating Jurisprudence Dataset.ipynb
01_Deep_Analysis_for_the_CASS_Data_set.ipynb		01_Deep_Analysis_for_the_CASS_Data_set.ipynb
02 TF-IDF + ML ALGORITHMS.ipynb		02 TF-IDF + ML ALGORITHMS.ipynb
03_CamemBERT_Classification.ipynb		03_CamemBERT_Classification.ipynb
Paper_Internship_ Saad_TAZROUTE.pdf		Paper_Internship_ Saad_TAZROUTE.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jurisprudences Classification

Abstract

Methodology :

About

Releases

Packages

Languages

SaadTazroute/JurisprudencesClassification

Folders and files

Latest commit

History

Repository files navigation

Jurisprudences Classification

Abstract

Methodology :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages