Skip to content

Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

License

Notifications You must be signed in to change notification settings

cgnorthcutt/reliablity_framework_for_rag

Repository files navigation

Demo of TLM: The Reliablity Solution for RAG, LLMs, and Data Enrichment

The main file to look at in this repo is the tlm_demo_new.ipynb

News! I added a new data enrichment and LLM reliability demo. Details:

  • Demo showing how Trustworthy Language Model add reliability scores to LLM outputs solving 4 use cases for 4 verticals.
  • expect typos and imperfection. For better results and more details, visit https://help.cleanlab.ai

Hacked this together in a couple hours. Shows how Cleanlab TLM can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

image

Dataset used for this example: here.

Base Open AI LLM versus Cleanlab TLM Performance on the public test set

Note these results were run with the fastest version of the TLM (quality_preset="low") for speed reasons (its a hackaathon demo). For improved results, use quality_preset="best".

  • Base Acc (Open-AI GPT-3.5): ~65%

  • TLM Acc: 65.5%

  • TLM Acc (TLM Confidence > 0.3): 66.2%

  • TLM Acc (TLM Confidence > 0.5): 69.9%

  • TLM Acc (TLM Confidence > 0.8): 74.0%

  • Base (Open-AI GPT-3.5) Acc (TLM Confidence < 0.5): 55.1%

If an expert reviews/corrects the 100 samples with lowest TLM confidence score:

  • the resulting accuracy will be: 79%
  • compared to the original base acc: 65%

The TLM (Trustworthy Langauge Model) is available in Cleanlab Studio

There's also a (reduced functionality) demo version available here running on free servers: https://cleanlab.ai/tlm

About

Demo showing how the Trustworthy Language Model add reliability to LLM outputs and improves RAG, agents, and data enrichment worfklows. can be used to improve fine-tuning of LLMs, accuracy of LLM outputs, and smart routing for RAG and agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published