This project provides a FastAPI-based backend for a PDF chat application that allows users to upload PDFs, ask questions about the content, retrieve chat history, and delete PDFs. The backend leverages various services, including Supabase for storage and Langchain for natural language processing.
- FastAPI
- Supabase
- Langchain
- Pinecone
- Python
- Gemini API
-
Clone the repository:
git clone https://github.com/mustafaazad03/fast-API-backend-AI-Planet.git cd fast-API-backend-AI-Planet
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the dependencies:
pip install -r requirements.txt
-
Set up Supabase:
- Create a project on Supabase.
- Note the
SUPABASE_URL
andSUPABASE_KEY
for your project. - Create a table named
pdfs
with columns forid
,filename
,content
, andhistory
.
Create a .env
file in the root directory and add the following environment variables:
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key
GOOGLE_API_KEY=your_google_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=aiplanet
Start the FastAPI server by running:
uvicorn app.main:app --reload
The application will be available at http://127.0.0.1:8000
.
-
Endpoint:
/upload-pdf/
-
Method:
POST
-
Description: Uploads a PDF file, extracts its text, splits the text into chunks, and stores the content and metadata in Supabase.
-
Request Body:
file
: PDF file (Content-Type: application/pdf)
-
Response:
{ "pdf_id": "string", "filename": "string" }
-
Endpoint:
/ask-question/
-
Method:
POST
-
Description: Asks a question about the content of a specific PDF and retrieves an answer based on the content and chat history.
-
Request Body:
{ "pdf_id": "string", "question": "string" }
-
Response:
{ "answer": "string" }
- Endpoint:
/delete-pdf/{pdf_id}
- Method:
DELETE
- Description: Deletes a specific PDF and its metadata from Supabase.
- Response:
{ "message": "PDF deleted successfully." }
- Endpoint:
/get-history/{pdf_id}
- Method:
GET
- Description: Retrieves the chat history for a specific PDF.
- Response:
{ "history": [ { "question": "string", "response": "string" }, ... ] }
pdf-chat-app/
├── app/
│ ├── core/
│ │ └── config.py
│ ├── endpoints/
│ │ ├── upload_pdf.py
│ │ ├── ask_question.py
│ │ ├── delete_pdf.py
│ │ └── get_history.py
│ ├── models/
│ │ └── request_models.py
│ ├── utils/
│ │ └── text_conversion.py
├── main.py
├── requirements.txt
├── .env
└── README.md
- Text Extraction: The application extracts text from uploaded PDF files using the
PyMuPDF
library. - Text Chunking: The extracted text is split into chunks of 500 characters to facilitate natural language processing.
- Natural Language Processing: The application uses the
Langchain
API to generate responses to user questions based on the content of the PDF and chat history. - Chat History: The application stores the chat history for each PDF in Supabase and retrieves it when requested.
- PDF Deletion: Users can delete PDF files and their metadata from the database.
- Text To HTML: The extracted text is converted to HTML format for better readability.
- QA Chain: The application uses the
Pinecone
API to store and retrieve question-answer pairs for each PDF.
- User Authentication: Implement user authentication to secure the API endpoints.
- Testing: Write unit tests for the API endpoints and utility functions.
- Pagination: Implement pagination for chat history to handle large datasets.
- Multiple File Upload: Allow users to upload multiple PDF files at once.
- Real-Time Chat: Implement real-time chat functionality using WebSockets.
- Dockerization: Dockerize the application for easier deployment and scaling.