Skip to content

Commit

Permalink
Merge pull request #52 from SylphAI-Inc/xiaoyi_doc
Browse files Browse the repository at this point in the history
Update textsplitter & fix documents
  • Loading branch information
Alleria1809 authored Jun 30, 2024
2 parents 65d0bdf + 3ff872f commit 5a0e51f
Show file tree
Hide file tree
Showing 42 changed files with 1,949 additions and 1,076 deletions.
4 changes: 4 additions & 0 deletions .env_example
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
OPENAI_API_KEY=YOUR_API_KEY_IF_YOU_USE_OPENAI
GROQ_API_KEY=YOUR_API_KEY_IF_YOU_USE_GROQ
ANTHROPIC_API_KEY=YOUR_API_KEY_IF_YOU_USE_ANTHROPIC
GOOGLE_API_KEY=YOUR_API_KEY_IF_YOU_USE_GOOGLE
COHERE_API_KEY=YOUR_API_KEY_IF_YOU_USE_COHERE
HF_TOKEN=YOUR_API_KEY_IF_YOU_USE_HF
68 changes: 68 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: Documentation

on:
push:
branches:
- xiaoyi_doc # Ensure this is the branch where you commit documentation updates

permissions:
contents: write
actions: read

jobs:
build-and-deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install dependencies using Poetry
run: |
poetry config virtualenvs.create false
poetry install
- name: Build documentation using Makefile
run: |
echo "Building documentation from: $(pwd)"
ls -l # Debug: List current directory contents
poetry run make -C docs html
working-directory: ${{ github.workspace }}

- name: List built documentation
run: |
find ./build/ -type f
working-directory: ${{ github.workspace }}/docs

- name: Create .nojekyll file
run: |
touch .nojekyll
working-directory: ${{ github.workspace }}/docs/build

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_branch: gh-pages
publish_dir: ./docs/build/
user_name: github-actions[bot]
user_email: github-actions[bot]@users.noreply.github.com

# - name: Debug Output
# run: |
# pwd # Print the current working directory
# ls -l # List files in the build directory
# cat ./source/conf.py # Show Sphinx config file for debugging
# working-directory: ${{ github.workspace }}/docs/build
102 changes: 102 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Introduction

LightRAG is the `PyTorch` library for building large language model (LLM) applications. We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines.
It is light, modular, and robust.

**PyTorch**

```python
import torch
import torch.nn as nn

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.dropout1(x)
x = self.dropout2(x)
x = self.fc1(x)
return self.fc2(x)

**LightRAG**

```python

from lightrag.core import Component, Generator
from lightrag.components.model_client import GroqAPIClient
from lightrag.utils import setup_env #noqa

class SimpleQA(Component):
def __init__(self):
super().__init__()
template = r"""<SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
"""
self.generator = Generator(
model_client=GroqAPIClient(),
model_kwargs={"model": "llama3-8b-8192"},
template=template,
)

def call(self, query):
return self.generator({"input_str": query})

async def acall(self, query):
return await self.generator.acall({"input_str": query})
```

## Simplicity

Developers who are building real-world Large Language Model (LLM) applications are the real heroes.
As a library, we provide them with the fundamental building blocks with 100% clarity and simplicity.

* Two fundamental and powerful base classes: Component for the pipeline and DataClass for data interaction with LLMs.
* We end up with less than two levels of subclasses. Class Hierarchy Visualization.
* The result is a library with bare minimum abstraction, providing developers with maximum customizability.

Similar to the PyTorch module, our Component provides excellent visualization of the pipeline structure.

```
SimpleQA(
(generator): Generator(
model_kwargs={'model': 'llama3-8b-8192'},
(prompt): Prompt(
template: <SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
, prompt_variables: ['input_str']
)
(model_client): GroqAPIClient()
)
)
```

## Controllability

Our simplicity did not come from doing 'less'.
On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic to offer developers maximum control and robustness.

* LLMs are sensitive to the prompt. We allow developers full control over their prompts without relying on API features such as tools and JSON format with components like Prompt, OutputParser, FunctionTool, and ToolManager.
* Our goal is not to optimize for integration, but to provide a robust abstraction with representative examples. See this in ModelClient and Retriever.
* All integrations, such as different API SDKs, are formed as optional packages but all within the same library. You can easily switch to any models from different providers that we officially support.

## Future of LLM Applications

On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions.
Like how PyTorch has united both researchers and production teams, it enables smooth transition from research to production.
With researchers building on LightRAG, production engineers can easily take over the method and test and iterate on their production data.
Researchers will want their code to be adapted into more products too.
68 changes: 68 additions & 0 deletions class_hierarchy_edges.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Component,ListParser
Component,JsonParser
Component,YamlParser
Component,ToolManager
Component,Prompt
Component,ModelClient
Component,Retriever
Component,FunctionTool
Component,Tokenizer
Component,Generator
Component,Embedder
Component,BatchEmbedder
Component,Sequential
Component,FunComponent
Component,ReActAgent
Component,OutputParser
Component,TextSplitter
Component,DocumentSplitter
Component,ToEmbeddings
Component,RetrieverOutputToContextStr
Component,DefaultLLMJudge
Component,LLMAugmenter
Generic,LocalDB
Generic,Retriever
Generic,GeneratorOutput
Generic,Parameter
Generic,Sample
Generic,Sampler
Generic,RandomSampler
Generic,ClassSampler
ModelClient,CohereAPIClient
ModelClient,TransformersClient
ModelClient,GroqAPIClient
ModelClient,GoogleGenAIClient
ModelClient,OpenAIClient
ModelClient,AnthropicAPIClient
Retriever,BM25Retriever
Retriever,PostgresRetriever
Retriever,RerankerRetriever
Retriever,LLMRetriever
Retriever,FAISSRetriever
Enum,DataClassFormatType
Enum,ModelType
Enum,DistanceToOperator
Enum,OptionalPackages
DataClass,EmbedderOutput
DataClass,GeneratorOutput
DataClass,RetrieverOutput
DataClass,FunctionDefinition
DataClass,Function
DataClass,FunctionExpression
DataClass,FunctionOutput
DataClass,StepOutput
DataClass,Document
DataClass,DialogTurn
DataClass,Instruction
DataClass,GeneratorStatesRecord
DataClass,GeneratorCallRecord
Generator,CoTGenerator
Generator,CoTGeneratorWithJsonOutput
OutputParser,YamlOutputParser
OutputParser,JsonOutputParser
OutputParser,ListOutputParser
OutputParser,BooleanOutputParser
Optimizer,BootstrapFewShot
Optimizer,LLMOptimizer
Sampler,RandomSampler
Sampler,ClassSampler
44 changes: 41 additions & 3 deletions developer_notes/generator.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,48 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GeneratorOutput(data='LightRAG is a light-based Real-time Anomaly Generator, which is a special type of anomaly detection system. It uses a combination of visual and statistical techniques to detect unusual patterns or outliers in a dataset in real-time, often for purposes such as identifying security threats, detecting fraud, or monitoring system performance. Would you like to know more about its applications or how it works?', error=None, usage=None, raw_response='LightRAG is a light-based Real-time Anomaly Generator, which is a special type of anomaly detection system. It uses a combination of visual and statistical techniques to detect unusual patterns or outliers in a dataset in real-time, often for purposes such as identifying security threats, detecting fraud, or monitoring system performance. Would you like to know more about its applications or how it works?')\n"
]
}
],
"source": [
"from lightrag.core import Component, Generator, Prompt\n",
"from lightrag.components.model_client import GroqAPIClient\n",
"from lightrag.utils import setup_env\n",
"\n",
"\n",
"class SimpleQA(Component):\n",
" def __init__(self):\n",
" super().__init__()\n",
" template = r\"\"\"<SYS>\n",
" You are a helpful assistant.\n",
" </SYS>\n",
" User: {{input_str}}\n",
" You:\n",
" \"\"\"\n",
" self.generator = Generator(\n",
" model_client=GroqAPIClient(), model_kwargs={\"model\": \"llama3-8b-8192\"}, template=template\n",
" )\n",
"\n",
" def call(self, query):\n",
" return self.generator({\"input_str\": query})\n",
"\n",
" async def acall(self, query):\n",
" return await self.generator.acall({\"input_str\": query})\n",
"\n",
"\n",
"qa = SimpleQA()\n",
"answer = qa(\"What is LightRAG?\")\n",
"\n",
"print(answer)"
]
}
],
"metadata": {
Expand Down
30 changes: 30 additions & 0 deletions developer_notes/generator_note.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from lightrag.core import Component, Generator
from lightrag.components.model_client import GroqAPIClient
from lightrag.utils import setup_env # noqa


class SimpleQA(Component):
def __init__(self):
super().__init__()
template = r"""<SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
"""
self.generator = Generator(
model_client=GroqAPIClient(),
model_kwargs={"model": "llama3-8b-8192"},
template=template,
)

def call(self, query):
return self.generator({"input_str": query})

async def acall(self, query):
return await self.generator.acall({"input_str": query})


qa = SimpleQA()
answer = qa("What is LightRAG?")
print(qa)
15 changes: 11 additions & 4 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
pydata-sphinx-theme==0.15.2
Sphinx==7.3.7
sphinx_design==0.6.0
sphinx-copybutton==0.5.2
pydata-sphinx-theme==0.15.3
sphinx-design==0.6.0
sphinx-copybutton==0.5.2
sphinx==7.3.7
nbsphinx==0.9.4
nbconvert==7.16.4
PyYAML
readthedocs-sphinx-search==0.3.2
numpy
tqdm
tiktoken
Loading

0 comments on commit 5a0e51f

Please sign in to comment.