v1.0.0

AstraBert · May 5, 2024 · 5b69aa5 · 5b69aa5
1 parent 5bfbe1e
commit 5b69aa5
Show file tree

Hide file tree

Showing 31 changed files with 1,021 additions and 329 deletions.
diff --git a/.env b/.env
@@ -0,0 +1,2 @@
+VOLUME="/source/local-machine/dir:target/multi-container/app/dir"
+# VOLUME="c:/Users/User/:/User" e.g.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
 flagged/
 scripts/__pycache__
-docker/build_command.sh
+docker/__pycache__
+docker/flagged
diff --git a/.v0_1_1/README.md b/.v0_1_1/README.md
@@ -0,0 +1,149 @@
+# everything-rag
+
+>_How was this README generated? Levearaging the power of AI with **reAIdme**, an HuggingChat assistant based on meta-llama/Llama-2-70b-chat-hf._
+_Go and give it a try [here](https://hf.co/chat/assistant/660d9a4f590a7924eed02a32)!_ 🤖
+
+<div align="center">
+    <img src="https://img.shields.io/github/languages/top/AstraBert/everything-rag" alt="GitHub top language">
+   <img src="https://img.shields.io/github/commit-activity/t/AstraBert/everything-rag" alt="GitHub commit activity">
+   <img src="https://img.shields.io/badge/everything_rag-stable-green" alt="Static Badge">
+   <img src="https://img.shields.io/badge/Release-v0.1.1-purple" alt="Static Badge">
+   <img src="https://img.shields.io/badge/Docker_image_size-6.6GB-red" alt="Static Badge">
+   <img src="https://img.shields.io/badge/Supported_platforms-linux/amd64-brown" alt="Static Badge">
+   <div>
+        <a href="https://huggingface.co/spaces/as-cle-bert/everything-rag"><img src="./data/example_chat.png" alt="Example chat" align="center"></a>
+        <p><i>Example chat with everything-rag, mediated by google/flan-t5-base</i></p>
+   </div>
+</div>
+
+
+### Table of Contents
+
+0. [TL;DR](#tldr)
+1. [Introduction](#introduction)
+2. [Inspiration](#inspiration)
+2. [Getting Started](#getting-started)
+3. [Using the Chatbot](#using-the-chatbot)
+4. [Troubleshooting](#troubleshooting)
+5. [Contributing](#contributing)
+6. [Upcoming features](#upcoming-features) 
+7. [References](#reference)
+
+## TL;DR
+
+* This documentation is soooooo long, I want to get my hands dirty!!!
+    >You can try out everything-rag the [dedicated HuggingFace space](https://huggingface.co/spaces/as-cle-bert/everything-rag), based on google/flan-t5-large.
+
+<div align="center">
+    <iframe
+        src="https://as-cle-bert-everything-rag.hf.space"
+        frameborder="0"
+        width="850"
+        height="450"
+    ></iframe>
+</div>
+
+## Introduction
+
+Introducing **everything-rag**, your fully customizable and local chatbot assistant! 🤖
+
+With everything-rag, you can:
+
+1. Use virtually any LLM you want: Switch between different LLMs like _gemma-7b_ or _llama-7b_ to suit your needs.
+2. Use your own data: everything-rag can work with any data you provide, whether it's a PDF about data sciences or a document about pallas' cats!🐈
+3. Enjoy 100% local and 100% free functionality: No need for hosted APIs or pay-as-you-go services. everything-rag is completely free to use and runs on your desktop. Plus, with the chat_history functionality in ConversationalRetrievalChain, you can easily retrieve and review previous conversations with your chatbot, making it even more convenient to use.
+
+While everything-rag offers many benefits, there are a couple of limitations to keep in mind:
+
+1. Performance-critical tasks: Loading large models (>1~2 GB) and generating text can be resource-intensive, so it's recommended to have at least 16GB RAM and 4 CPU cores for optimal performance.
+2. Small LLMs can still allucinate: While large LLMs like _gemma-7b_ and _llama-7b_ tend to produce better results, smaller models like _openai-community/gpt2_ can still produce suboptimal responses in certain situations.
+
+In summary, everything-rag is a simple, customizable, and local chatbot assistant that offers a wide range of features and capabilities. By leveraging the power of RAG, everything-rag offers a unique and flexible chatbot experience that can be tailored to your specific needs and preferences. Whether you're looking for a simple chatbot to answer basic questions or a more advanced conversational AI to engage with your users, everything-rag has got you covered.😊
+
+## Inspiration
+
+This project is a humble and modest carbon-copy of its main and true inspirations, i.e. [Jan.ai](https://jan.ai/), [Cheshire Cat AI](https://cheshirecat.ai/), [privateGPT](https://privategpt.io/) and many other projects that focus on making LLMs (and AI in general) open-source and easily accessible to everyone. 
+
+## Getting Started
+
+You can do two things:
+
+- Play with generation on [Kaggle](https://www.kaggle.com/code/astrabertelli/gemma-for-datasciences)
+- Clone this repository, head over to [the python script](./scripts/gemma_for_datasciences.py) and modify everything to your needs!
+- Docker installation (🥳**FULLY IMPLEMENTED**): you can install everything-rag through docker image and running it thanks do Docker by following these really simple commands:
+
+```bash
+docker pull ghcr.io/astrabert/everything-rag:latest
+docker run -p 7860:7860 everything-rag:latest -m microsoft/phi-2 -t text-generation
+```
+- **IMPORTANT NOTE**: running the script within `docker run` does not log the port on which the app is running until you press `Ctrl+C`, but in that moment it also interrupt the execution! The app will run on port `0.0.0.0:7860` (or `localhost:7860` if your browser is Windows-based), so just make sure to open your browser on that port and to refresh it after 30s to 1 or 2 mins, when the model and the tokenizer should be loaded and the app should be ready to work!
+
+- As you can see, you just need to specify the LLM model and its task (this is mandatory). Keep in mind that, for what concerns v0.1.1, everything-rag supports only text-generation and text2text-generation. For these two tasks, you can use virtually *any* model from HuggingFace Hub: the sole recommendation is to watch out for your disk space, RAM and CPU power, LLMs can be quite resource-consuming!
+
+## Using the Chatbot
+
+### GUI
+
+The chatbot has a brand-new GradIO-based interface that runs on local server. You can interact by uploading directly your pdf files and/or sending messages, all by running:
+
+```bash
+python3 scripts/chat.py -m provider/modelname -t task
+```
+
+The suggested workflow is, nevertheless, the one that exploits Docker.
+
+### Code breakdown - notebook
+
+Everything is explained in [the dedicated notebook](./scripts/gemma-for-datasciences.ipynb), but here's a brief breakdown of the code:
+
+1. The first section imports the necessary libraries, including Hugging Face Transformers, langchain-community, and tkinter.
+2. The next section installs the necessary dependencies, including the gemma-2b model, and defines some useful functions for making the LLM-based data science assistant work.
+3. The create_a_persistent_db function creates a persistent database from a PDF file, using the PyPDFLoader to split the PDF into smaller chunks and the Hugging Face embeddings to transform the text into numerical vectors. The resulting database is stored in a LocalFileStore.
+4. The just_chatting function implements a chat system using the Hugging Face model and the persistent database. It takes a query, tokenizes it, and passes it to the model to generate a response. The response is then returned as a dictionary of strings.
+5. The chat_gui class defines a simple chat GUI that displays the chat history and allows the user to input queries. The send_message function is called when the user presses the "Send" button, and it sends the user's message to the just_chatting function to get a response.
+6. The script then creates a root Tk object and instantiates a ChatGUI object, which starts the main loop.
+
+Et voilà, your chatbot is up and running!🦿
+
+## Troubleshooting
+
+### Common Issues Q&A
+
+* Q: The chatbot is not responding😭
+    > A: Make sure that the PDF document is in the specified path and that the database has been created successfully. 
+* Q: The chatbot is taking soooo long🫠
+    > A: This is quite common with resource-limited environments that deal with too large or too small models: large models require **at least** 32 GB RAM and >8 core CPU, whereas small model can easily be allucinating and producing responses that are endless repetitions of the same thing! Check *penalty_score* parameter to avoid this. **try rephrasing the query and be as specific as possible**
+* Q: My model is allucinating and/or repeating the same sentence over and over again😵‍💫
+    > A: This is quite common with small or old models: check *penalty_score* and *temperature* parameter to avoid this. 
+* Q: The chatbot is giving incorrect/non-meaningful answers🤥
+    >A: Check that the PDF document is relevant and up-to-date. Also, **try rephrasing the query and be as specific as possible**
+* Q: An error occurred while generating the answer💔
+    >A: This frequently occurs when your (small) LLM has a limited maximum hidden size (generally 512 or 1024) and the context that the retrieval-augmented chain produces goes beyond that maximum. You could, potentially, modify the configuration of the model, but this would mean dramatically increase its resource consumption, and your small laptop is not prepared to take it, trust me!!! A solution, if you have enough RAM and CPU power, is to switch to larger LLMs: they do not have problems in this sense.
+
+## Upcoming features🚀
+
+- [ ] Multi-lingual support (expected for **version 0.2.0**)
+
+- [ ] More text-based tasks: question answering, summarisation (expected for **version 0.3.0**)
+
+- [ ] Computer vision: Image-to-text, image generation, image segmentation... (expected for **version 1.0.0**)
+
+## Contributing
+
+
+Contributions are welcome! If you would like to improve the chatbot's functionality or add new features, please fork the repository and submit a pull request.
+
+## Reference
+
+
+* [Hugging Face Transformers](https://github.com/huggingface/transformers)
+* [Langchain-community](https://github.com/langchain-community/langchain-community)
+* [Tkinter](https://docs.python.org/3/library/tkinter.html)
+* [PDF document about data science](https://www.kaggle.com/datasets/astrabertelli/what-is-datascience-docs)
+* [GradIO](https://www.gradio.app/)
+
+## License
+
+This project is licensed under the Apache 2.0 License.
+
+If you use this work for your projects, please consider citing the author [Astra Bertelli](http://astrabert.vercel.app).
diff --git a/data/WhatisDataScienceFinalMay162018.pdf → .../data/WhatisDataScienceFinalMay162018.pdf b/data/WhatisDataScienceFinalMay162018.pdf → .../data/WhatisDataScienceFinalMay162018.pdf
diff --git a/data/example_chat.png → .v0_1_1/data/example_chat.png b/data/example_chat.png → .v0_1_1/data/example_chat.png
diff --git a/.v0_1_1/docker/Dockerfile b/.v0_1_1/docker/Dockerfile
@@ -0,0 +1,31 @@
+# Use an official Python runtime as a parent image
+FROM python:3.10-slim-bookworm
+
+# Set the working directory in the container to /app
+WORKDIR /app
+
+# Add the current directory contents into the container at /app
+ADD . /app
+
+# Update and install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    libpq-dev \
+    libffi-dev \
+    libssl-dev \
+    musl-dev \
+    libxml2-dev \
+    libxslt1-dev \
+    zlib1g-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Python dependencies
+RUN python3 -m pip cache purge
+RUN python3 -m pip install --no-cache-dir -r requirements.txt
+
+
+# Expose the port that the application will run on
+EXPOSE 7860
+
+# Set the entrypoint with a default command and allow the user to override it
+ENTRYPOINT ["python3", "chat.py"]
diff --git a/.v0_1_1/docker/__pycache__/utils.cpython-310.pyc b/.v0_1_1/docker/__pycache__/utils.cpython-310.pyc
diff --git a/.v0_1_1/docker/build_command.sh b/.v0_1_1/docker/build_command.sh
@@ -0,0 +1,11 @@
+docker buildx build \
+--label org.opencontainers.image.title=everything-rag \
+--label org.opencontainers.image.description='Introducing everything-rag, your fully customizable and local chatbot assistant!' \
+--label org.opencontainers.image.url=https://github.com/AstraBert/everything-rag \
+--label org.opencontainers.image.source=https://github.com/AstraBert/everything-rag --label org.opencontainers.image.version=0.1.7 \
+--label org.opencontainers.image.created=2024-04-07T12:39:11.393Z \
+--label org.opencontainers.image.licenses=Apache-2.0 \
+--platform linux/amd64 \
+--tag ghcr.io/astrabert/everything-rag:latest \
+--tag ghcr.io/astrabert/everything-rag:0.1.1 \
+--push .
diff --git a/docker/chat.py → .v0_1_1/docker/chat.py b/docker/chat.py → .v0_1_1/docker/chat.py
diff --git a/.v0_1_1/docker/requirements.txt b/.v0_1_1/docker/requirements.txt
@@ -0,0 +1,10 @@
+langchain-community==0.0.13 
+langchain==0.1.1 
+pypdf==3.17.4
+sentence_transformers==2.2.2
+chromadb==0.4.22
+cryptography>=3.1
+gradio
+transformers
+trl 
+peft
diff --git a/.v0_1_1/docker/utils.py b/.v0_1_1/docker/utils.py
@@ -0,0 +1,172 @@
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM, pipeline
+import time
+from langchain_community.llms import HuggingFacePipeline
+from langchain.storage import LocalFileStore
+from langchain.embeddings import CacheBackedEmbeddings
+from langchain_community.vectorstores import Chroma
+from langchain.text_splitter import CharacterTextSplitter
+from langchain_community.document_loaders import PyPDFLoader
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain.chains import ConversationalRetrievalChain
+import os
+from pypdf import PdfMerger
+from argparse import ArgumentParser
+
+
+argparse = ArgumentParser()
+argparse.add_argument(
+    "-m",
+    "--model",
+    help="HuggingFace Model identifier, such as 'google/flan-t5-base'",
+    required=True,
+)
+
+argparse.add_argument(
+    "-t",
+    "--task",
+    help="Task for the model: for now supported task are ['text-generation', 'text2text-generation']",
+    required=True,
+)
+
+args = argparse.parse_args()
+
+
+mod = args.model
+tsk = args.task
+
+mod = mod.replace("\"", "").replace("'", "")
+tsk = tsk.replace("\"", "").replace("'", "")
+
+TASK_TO_MODEL = {"text-generation": AutoModelForCausalLM, "text2text-generation": AutoModelForSeq2SeqLM}
+
+if tsk not in TASK_TO_MODEL:
+    raise Exception("Unsopported task! Supported task are ['text-generation', 'text2text-generation']")
+
+def merge_pdfs(pdfs: list):
+    merger = PdfMerger()
+    for pdf in pdfs:
+        merger.append(pdf)
+    merger.write(f"{pdfs[-1].split('.')[0]}_results.pdf")
+    merger.close()
+    return f"{pdfs[-1].split('.')[0]}_results.pdf"
+
+def create_a_persistent_db(pdfpath, dbpath, cachepath) -> None:
+    """
+    Creates a persistent database from a PDF file.
+
+    Args:
+        pdfpath (str): The path to the PDF file.
+        dbpath (str): The path to the storage folder for the persistent LocalDB.
+        cachepath (str): The path to the storage folder for the embeddings cache.
+    """
+    print("Started the operation...")
+    a = time.time()
+    loader = PyPDFLoader(pdfpath)
+    documents = loader.load()
+
+    ### Split the documents into smaller chunks for processing
+    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
+    texts = text_splitter.split_documents(documents)
+
+    ### Use HuggingFace embeddings for transforming text into numerical vectors
+    ### This operation can take a while the first time but, once you created your local database with
+    ### cached embeddings, it should be a matter of seconds to load them!
+    embeddings = HuggingFaceEmbeddings()
+    store = LocalFileStore(
+        os.path.join(
+            cachepath, os.path.basename(pdfpath).split(".")[0] + "_cache"
+        )
+    )
+    cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
+        underlying_embeddings=embeddings,
+        document_embedding_cache=store,
+        namespace=os.path.basename(pdfpath).split(".")[0],
+    )
+
+    b = time.time()
+    print(
+        f"Embeddings successfully created and stored at {os.path.join(cachepath, os.path.basename(pdfpath).split('.')[0]+'_cache')} under namespace: {os.path.basename(pdfpath).split('.')[0]}"
+    )
+    print(f"To load and embed, it took: {b - a}")
+
+    persist_directory = os.path.join(
+        dbpath, os.path.basename(pdfpath).split(".")[0] + "_localDB"
+    )
+    vectordb = Chroma.from_documents(
+        documents=texts,
+        embedding=cached_embeddings,
+        persist_directory=persist_directory,
+    )
+    c = time.time()
+    print(
+        f"Persistent database successfully created and stored at {os.path.join(dbpath, os.path.basename(pdfpath).split('.')[0] + '_localDB')}"
+    )
+    print(f"To create a persistent database, it took: {c - b}")
+    return vectordb
+
+def convert_none_to_str(l: list):
+    newlist = []
+    for i in range(len(l)):
+        if l[i] is None or type(l[i])==tuple:
+            newlist.append("")
+        else:
+            newlist.append(l[i])
+    return tuple(newlist)
+
+def just_chatting(
+    task,
+    model,
+    tokenizer,
+    query,
+    vectordb,
+    chat_history=[]
+):
+    """
+    Implements a chat system using Hugging Face models and a persistent database.
+
+    Args:
+        task (str): Task for the pipeline; for now supported task are ['text-generation', 'text2text-generation']
+        model (AutoModelForCausalLM): Hugging Face model, already loaded and prepared.
+        tokenizer (AutoTokenizer): Hugging Face tokenizer, already loaded and prepared.
+        model_task (str): Task for the Hugging Face model.
+        persistent_db_dir (str): Directory for the persistent database.
+        embeddings_cache (str): Path to cache Hugging Face embeddings.
+        pdfpath (str): Path to the PDF file.
+        query (str): Question by the user
+        vectordb (ChromaDB): vectorstorer variable for retrieval.
+        chat_history (list): A list with previous questions and answers, serves as context; by default it is empty (it may make the model allucinate)
+    """
+    ### Create a text-generation pipeline and connect it to a ConversationalRetrievalChain
+    pipe = pipeline(task,
+                    model=model,
+                    tokenizer=tokenizer,
+                    max_new_tokens = 2048,
+                    repetition_penalty = float(1.2),
+    )
+
+    local_llm = HuggingFacePipeline(pipeline=pipe)
+    llm_chain = ConversationalRetrievalChain.from_llm(
+        llm=local_llm,
+        chain_type="stuff",
+        retriever=vectordb.as_retriever(search_kwargs={"k": 1}),
+        return_source_documents=False,
+    )
+    rst = llm_chain({"question": query, "chat_history": chat_history})
+    return rst
+
+
+try:
+    tokenizer = AutoTokenizer.from_pretrained(
+        mod,
+    )
+
+
+    model = TASK_TO_MODEL[tsk].from_pretrained(
+        mod,
+    )
+except Exception as e:
+    import sys
+    print(f"The error {e} occured while handling model and tokenizer loading: please ensure that the model you provided was correct and suitable for the specified task. Be also sure that the HF repository for the loaded model contains all the necessary files.", file=sys.stderr)
+    sys.exit(1)
+
+
diff --git a/.v0_1_1/scripts/__pycache__/utils.cpython-310.pyc b/.v0_1_1/scripts/__pycache__/utils.cpython-310.pyc
diff --git a/scripts/gemma-for-datasciences.ipynb → .v0_1_1/scripts/gemma-for-datasciences.ipynb b/scripts/gemma-for-datasciences.ipynb → .v0_1_1/scripts/gemma-for-datasciences.ipynb
diff --git a/scripts/gemma_for_datasciences.py → .v0_1_1/scripts/gemma_for_datasciences.py b/scripts/gemma_for_datasciences.py → .v0_1_1/scripts/gemma_for_datasciences.py