Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update textsplitter & fix documents #52

Merged
merged 66 commits into from
Jun 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
2ff03e4
update
Alleria1809 Jun 25, 2024
ef04d55
Merge remote-tracking branch 'origin/main' into xiaoyi_doc
Alleria1809 Jun 25, 2024
c33cd9a
update
Alleria1809 Jun 26, 2024
c655254
improve text splitter and model client
Alleria1809 Jun 27, 2024
d446177
control the github actions
Alleria1809 Jun 27, 2024
75e44e8
Merge remote-tracking branch 'origin/main' into xiaoyi_doc
Alleria1809 Jun 27, 2024
789daca
remove the doc file
Alleria1809 Jun 27, 2024
b174048
test docs
Alleria1809 Jun 28, 2024
47e93d8
add dependencies to support notebook
Alleria1809 Jun 28, 2024
743cb6a
update the action flow
Alleria1809 Jun 28, 2024
3f2040d
update python version
Alleria1809 Jun 28, 2024
07b90d4
update python version
Alleria1809 Jun 28, 2024
6e10aa9
update dependencies
Alleria1809 Jun 28, 2024
80507c3
update workflow
Alleria1809 Jun 28, 2024
60aff83
update workflow
Alleria1809 Jun 28, 2024
886501d
update workflow
Alleria1809 Jun 28, 2024
d8d2cac
update workflow
Alleria1809 Jun 28, 2024
686fc9c
update workflow
Alleria1809 Jun 28, 2024
32b1b2b
update workflow
Alleria1809 Jun 28, 2024
c7d8639
update workflow
Alleria1809 Jun 28, 2024
69da375
update workflow
Alleria1809 Jun 28, 2024
05a6c37
update workflow
Alleria1809 Jun 28, 2024
09cda8b
update workflow
Alleria1809 Jun 28, 2024
f5e7136
update workflow
Alleria1809 Jun 28, 2024
074751a
update workflow
Alleria1809 Jun 28, 2024
6ed5ca3
update the model client
Alleria1809 Jun 29, 2024
f125a28
update the workdlow
Alleria1809 Jun 29, 2024
5242d8e
update the workdlow
Alleria1809 Jun 29, 2024
6d53843
Merge pull request #54 from SylphAI-Inc/main
liyin2015 Jun 29, 2024
65950c1
document clean up
liyin2015 Jun 29, 2024
75ab0ea
home page and developer notes
liyin2015 Jun 29, 2024
b7fa9ec
Merge pull request #56 from SylphAI-Inc/li
liyin2015 Jun 29, 2024
53ec44b
add author name
liyin2015 Jun 29, 2024
aedefb5
update workflow + update code with feedback
Alleria1809 Jun 29, 2024
b98fc11
use simple version to test
Alleria1809 Jun 29, 2024
ca0605c
add workflow
Alleria1809 Jun 29, 2024
fa73a97
remove the footnote
liyin2015 Jun 29, 2024
53c8e71
Merge pull request #57 from SylphAI-Inc/li
liyin2015 Jun 29, 2024
e8d1099
make the sidebar narrower
liyin2015 Jun 29, 2024
e708fb8
update intro page, add class hierarchy visualization
liyin2015 Jun 29, 2024
7d16121
Merge pull request #58 from SylphAI-Inc/li
liyin2015 Jun 30, 2024
ba824a7
add debug
Alleria1809 Jun 30, 2024
e2b8235
fix the grammar errors on intro page
liyin2015 Jun 30, 2024
ffeb73e
debug workflow
Alleria1809 Jun 30, 2024
2e24385
make the intro into the readme of the repo
liyin2015 Jun 30, 2024
a865b1d
debug workflow
Alleria1809 Jun 30, 2024
5e5e1d8
instruction on installation
liyin2015 Jun 30, 2024
29c78c1
install instruction
liyin2015 Jun 30, 2024
30138df
install instruction
liyin2015 Jun 30, 2024
86b2334
install instruction
liyin2015 Jun 30, 2024
1260642
Merge branch 'xiaoyi_doc' into li
liyin2015 Jun 30, 2024
2a5a260
Merge pull request #59 from SylphAI-Inc/li
liyin2015 Jun 30, 2024
501a5a9
debug workflow
Alleria1809 Jun 30, 2024
d873292
debug workflow
Alleria1809 Jun 30, 2024
de8f8fe
debug workflow
Alleria1809 Jun 30, 2024
a44b117
debug workflow
Alleria1809 Jun 30, 2024
f0a461a
debug workflow
Alleria1809 Jun 30, 2024
a8231e6
debug workflow
Alleria1809 Jun 30, 2024
fc4584b
debug workflow
Alleria1809 Jun 30, 2024
bf8adf5
rm the orginal text splitter
Alleria1809 Jun 30, 2024
b655780
update the tests
Alleria1809 Jun 30, 2024
80596b1
update the tests
Alleria1809 Jun 30, 2024
49bc698
update the tests
Alleria1809 Jun 30, 2024
5a2584a
update the tests
Alleria1809 Jun 30, 2024
384a88e
update the tests
Alleria1809 Jun 30, 2024
3ff872f
fix the images
Alleria1809 Jun 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env_example
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
OPENAI_API_KEY=YOUR_API_KEY_IF_YOU_USE_OPENAI
GROQ_API_KEY=YOUR_API_KEY_IF_YOU_USE_GROQ
ANTHROPIC_API_KEY=YOUR_API_KEY_IF_YOU_USE_ANTHROPIC
GOOGLE_API_KEY=YOUR_API_KEY_IF_YOU_USE_GOOGLE
COHERE_API_KEY=YOUR_API_KEY_IF_YOU_USE_COHERE
HF_TOKEN=YOUR_API_KEY_IF_YOU_USE_HF
68 changes: 68 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: Documentation

on:
push:
branches:
- xiaoyi_doc # Ensure this is the branch where you commit documentation updates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets create a release branch, we need to start to know how to manage different release


permissions:
contents: write
actions: read

jobs:
build-and-deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Install dependencies using Poetry
run: |
poetry config virtualenvs.create false
poetry install

- name: Build documentation using Makefile
run: |
echo "Building documentation from: $(pwd)"
ls -l # Debug: List current directory contents
poetry run make -C docs html
working-directory: ${{ github.workspace }}

- name: List built documentation
run: |
find ./build/ -type f
working-directory: ${{ github.workspace }}/docs

- name: Create .nojekyll file
run: |
touch .nojekyll
working-directory: ${{ github.workspace }}/docs/build

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_branch: gh-pages
publish_dir: ./docs/build/
user_name: github-actions[bot]
user_email: github-actions[bot]@users.noreply.github.com

# - name: Debug Output
# run: |
# pwd # Print the current working directory
# ls -l # List files in the build directory
# cat ./source/conf.py # Show Sphinx config file for debugging
# working-directory: ${{ github.workspace }}/docs/build
102 changes: 102 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Introduction

LightRAG is the `PyTorch` library for building large language model (LLM) applications. We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines.
It is light, modular, and robust.

**PyTorch**

```python
import torch
import torch.nn as nn

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.dropout1(x)
x = self.dropout2(x)
x = self.fc1(x)
return self.fc2(x)

**LightRAG**

```python

from lightrag.core import Component, Generator
from lightrag.components.model_client import GroqAPIClient
from lightrag.utils import setup_env #noqa

class SimpleQA(Component):
def __init__(self):
super().__init__()
template = r"""<SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
"""
self.generator = Generator(
model_client=GroqAPIClient(),
model_kwargs={"model": "llama3-8b-8192"},
template=template,
)

def call(self, query):
return self.generator({"input_str": query})

async def acall(self, query):
return await self.generator.acall({"input_str": query})
```

## Simplicity

Developers who are building real-world Large Language Model (LLM) applications are the real heroes.
As a library, we provide them with the fundamental building blocks with 100% clarity and simplicity.

* Two fundamental and powerful base classes: Component for the pipeline and DataClass for data interaction with LLMs.
* We end up with less than two levels of subclasses. Class Hierarchy Visualization.
* The result is a library with bare minimum abstraction, providing developers with maximum customizability.

Similar to the PyTorch module, our Component provides excellent visualization of the pipeline structure.

```
SimpleQA(
(generator): Generator(
model_kwargs={'model': 'llama3-8b-8192'},
(prompt): Prompt(
template: <SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
, prompt_variables: ['input_str']
)
(model_client): GroqAPIClient()
)
)
```

## Controllability

Our simplicity did not come from doing 'less'.
On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic to offer developers maximum control and robustness.

* LLMs are sensitive to the prompt. We allow developers full control over their prompts without relying on API features such as tools and JSON format with components like Prompt, OutputParser, FunctionTool, and ToolManager.
* Our goal is not to optimize for integration, but to provide a robust abstraction with representative examples. See this in ModelClient and Retriever.
* All integrations, such as different API SDKs, are formed as optional packages but all within the same library. You can easily switch to any models from different providers that we officially support.

## Future of LLM Applications

On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions.
Like how PyTorch has united both researchers and production teams, it enables smooth transition from research to production.
With researchers building on LightRAG, production engineers can easily take over the method and test and iterate on their production data.
Researchers will want their code to be adapted into more products too.
68 changes: 68 additions & 0 deletions class_hierarchy_edges.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Component,ListParser
Component,JsonParser
Component,YamlParser
Component,ToolManager
Component,Prompt
Component,ModelClient
Component,Retriever
Component,FunctionTool
Component,Tokenizer
Component,Generator
Component,Embedder
Component,BatchEmbedder
Component,Sequential
Component,FunComponent
Component,ReActAgent
Component,OutputParser
Component,TextSplitter
Component,DocumentSplitter
Component,ToEmbeddings
Component,RetrieverOutputToContextStr
Component,DefaultLLMJudge
Component,LLMAugmenter
Generic,LocalDB
Generic,Retriever
Generic,GeneratorOutput
Generic,Parameter
Generic,Sample
Generic,Sampler
Generic,RandomSampler
Generic,ClassSampler
ModelClient,CohereAPIClient
ModelClient,TransformersClient
ModelClient,GroqAPIClient
ModelClient,GoogleGenAIClient
ModelClient,OpenAIClient
ModelClient,AnthropicAPIClient
Retriever,BM25Retriever
Retriever,PostgresRetriever
Retriever,RerankerRetriever
Retriever,LLMRetriever
Retriever,FAISSRetriever
Enum,DataClassFormatType
Enum,ModelType
Enum,DistanceToOperator
Enum,OptionalPackages
DataClass,EmbedderOutput
DataClass,GeneratorOutput
DataClass,RetrieverOutput
DataClass,FunctionDefinition
DataClass,Function
DataClass,FunctionExpression
DataClass,FunctionOutput
DataClass,StepOutput
DataClass,Document
DataClass,DialogTurn
DataClass,Instruction
DataClass,GeneratorStatesRecord
DataClass,GeneratorCallRecord
Generator,CoTGenerator
Generator,CoTGeneratorWithJsonOutput
OutputParser,YamlOutputParser
OutputParser,JsonOutputParser
OutputParser,ListOutputParser
OutputParser,BooleanOutputParser
Optimizer,BootstrapFewShot
Optimizer,LLMOptimizer
Sampler,RandomSampler
Sampler,ClassSampler
44 changes: 41 additions & 3 deletions developer_notes/generator.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,48 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GeneratorOutput(data='LightRAG is a light-based Real-time Anomaly Generator, which is a special type of anomaly detection system. It uses a combination of visual and statistical techniques to detect unusual patterns or outliers in a dataset in real-time, often for purposes such as identifying security threats, detecting fraud, or monitoring system performance. Would you like to know more about its applications or how it works?', error=None, usage=None, raw_response='LightRAG is a light-based Real-time Anomaly Generator, which is a special type of anomaly detection system. It uses a combination of visual and statistical techniques to detect unusual patterns or outliers in a dataset in real-time, often for purposes such as identifying security threats, detecting fraud, or monitoring system performance. Would you like to know more about its applications or how it works?')\n"
]
}
],
"source": [
"from lightrag.core import Component, Generator, Prompt\n",
"from lightrag.components.model_client import GroqAPIClient\n",
"from lightrag.utils import setup_env\n",
"\n",
"\n",
"class SimpleQA(Component):\n",
" def __init__(self):\n",
" super().__init__()\n",
" template = r\"\"\"<SYS>\n",
" You are a helpful assistant.\n",
" </SYS>\n",
" User: {{input_str}}\n",
" You:\n",
" \"\"\"\n",
" self.generator = Generator(\n",
" model_client=GroqAPIClient(), model_kwargs={\"model\": \"llama3-8b-8192\"}, template=template\n",
" )\n",
"\n",
" def call(self, query):\n",
" return self.generator({\"input_str\": query})\n",
"\n",
" async def acall(self, query):\n",
" return await self.generator.acall({\"input_str\": query})\n",
"\n",
"\n",
"qa = SimpleQA()\n",
"answer = qa(\"What is LightRAG?\")\n",
"\n",
"print(answer)"
]
}
],
"metadata": {
Expand Down
30 changes: 30 additions & 0 deletions developer_notes/generator_note.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from lightrag.core import Component, Generator
from lightrag.components.model_client import GroqAPIClient
from lightrag.utils import setup_env # noqa


class SimpleQA(Component):
def __init__(self):
super().__init__()
template = r"""<SYS>
You are a helpful assistant.
</SYS>
User: {{input_str}}
You:
"""
self.generator = Generator(
model_client=GroqAPIClient(),
model_kwargs={"model": "llama3-8b-8192"},
template=template,
)

def call(self, query):
return self.generator({"input_str": query})

async def acall(self, query):
return await self.generator.acall({"input_str": query})


qa = SimpleQA()
answer = qa("What is LightRAG?")
print(qa)
15 changes: 11 additions & 4 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
pydata-sphinx-theme==0.15.2
Sphinx==7.3.7
sphinx_design==0.6.0
sphinx-copybutton==0.5.2
pydata-sphinx-theme==0.15.3
sphinx-design==0.6.0
sphinx-copybutton==0.5.2
sphinx==7.3.7
nbsphinx==0.9.4
nbconvert==7.16.4
PyYAML
readthedocs-sphinx-search==0.3.2
numpy
tqdm
tiktoken
Loading
Loading