Skip to content

Latest commit

 

History

History
211 lines (177 loc) · 9.06 KB

README.md

File metadata and controls

211 lines (177 loc) · 9.06 KB

Benchmarking LLM


About The Project

Our project introduces a metric designed to evaluate the quality of textual summaries. This metric is pivotal in fields like finance, where precise information synthesis is critical.

  • Quality Discrimination: Distinguishes effectively between superior and inferior summaries, ensuring clear differentiation in their factual accuracies.
  • Factual Accuracy Measurement: Detects and quantifies any factual deviations, assigning lower scores to less accurate summaries.
  • Detail-Oriented Assessment: Provides comprehensive evaluations, focusing on how well the summary captures the essence and details of the original text.

This metric is not merely a tool for evaluation; it's a step towards enhancing the integrity of information processing in sectors where factual accuracy is non-negotiable.

Framework

Named Entity Comparison: Extract and compare financial-related named entities in texts. Analyzes and visualizes named entity accuracy and presence in summaries versus original texts.

Sentence-Level-based Summary Checking: Applies LLMs to check the consistency between the summary and the original text sentence by sentence. Highlights and identifies inconsistencies between the summary and the original text for in-depth analysis.

Direcroty Tree

│   .gitignore
│   LICENSE.txt
│   README.md
│
├───config
│       config.sh
│       requirements.txt
│
├───data
│       10summary_with_result.csv
│       falsified_summary.csv
│       falsified_summary_level.csv
│       final_version_cropped_first1000.csv
│       final_version_withouttext.csv
│
├───doc
│   ├───About_Us
│   │       Team's Bio.pdf
│   │
│   ├───Academic Paper
│   │       5054_factuality_enhanced_language_m.pdf
│   │       Evaluating Factuality.pdf
│   │       Evaluating the Factual Consistency.pdf
│   │
│   ├───Project Description
│   │       Benchmarking LLM .pdf
│   │       CAPSTONE PROJECT PROPOSAL Fidelity Summarization Metrics.pdf
│   │
│   └───Report
│           Capstone Project Initial Due Diligence Report.pdf
│           F23_Fidelity_Benchmarking LLM_1st_report.pdf
│           F23_Fidelity_Benchmarking LLM_final_report.pdf
│           F23_Fidelity_BenchmarkLLM_poster.pdf
│           Project Proposal.pdf
│
├───res
│   │   10levels.svg
│   │   good_to_bad.svg
│   │   LLM_Assisted_Framework.jpg
│   │   NER_Framework.jpg
│   │
│   └───Baseline
│           Boxplot_for_Scores.png
│
├───samples
│       documents_extraction.ipynb
│       presentation.ipynb
│       summary_level_with_result.csv
│
├───src
│   │   Bart.py
│   │   PaLM.py
│   │   pipeline.py
│   │   summary_generation.py
│ 
│
└───test
    ├───Data_Pipeline
    └───Summary_Generation
            bart.ipynb
            llama2.ipynb
            PaLM2.ipynb
            test.py

Getting Started

Python Pytorch scikit-learn NumPy Pandas

Dependencies

python==3.10.0
ipython==8.15.0
nltk==3.8.1
numpy==1.24.3
openai==1.3.7
pandas==1.5.3
python-dotenv==1.0.0
rouge_score==0.1.2
scikit_learn==1.2.2
sentence_transformers==2.2.2
spacy==3.7.2
stanza==1.6.1

Configuration

Shell Script

1. Environment setup

Setup with python virtual environment

bash ./config/config.sh

Setup with conda

bash conda install --file ./config/requirements.txt

2. OpenAI API setup

import sys
sys.path.append('../src/')
import pipeline
os.environ['OPENAI_API_KEY'] = 'Your OpenAi API Key'

Usage

Jupyter Notebook

The data extraction process is in documents_extraction

You can also find the demo and result compare with baseline metrics in presentation.

Report

LaTeX

License

Generic badge Hits

Group Members

Cong Chen (cc4887)

Email Github

Longxiang Zhang (lz2869)

Email Github

Ruolan Lin (rl3312)

Email Github

Taichen Zhou (tz2555)

Email Github

Yichen Huang (yh3550) - Team Captain

Email Github LinkedIn

Fidelity Memtors

Lilli Ann Rowan, Indraneel Biswas, Michael Threlfall, and Diana Kulmizev

Instructor/CA

Adam Kelleher