-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update textsplitter & fix documents #52
Merged
Merged
Changes from all commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
2ff03e4
update
Alleria1809 ef04d55
Merge remote-tracking branch 'origin/main' into xiaoyi_doc
Alleria1809 c33cd9a
update
Alleria1809 c655254
improve text splitter and model client
Alleria1809 d446177
control the github actions
Alleria1809 75e44e8
Merge remote-tracking branch 'origin/main' into xiaoyi_doc
Alleria1809 789daca
remove the doc file
Alleria1809 b174048
test docs
Alleria1809 47e93d8
add dependencies to support notebook
Alleria1809 743cb6a
update the action flow
Alleria1809 3f2040d
update python version
Alleria1809 07b90d4
update python version
Alleria1809 6e10aa9
update dependencies
Alleria1809 80507c3
update workflow
Alleria1809 60aff83
update workflow
Alleria1809 886501d
update workflow
Alleria1809 d8d2cac
update workflow
Alleria1809 686fc9c
update workflow
Alleria1809 32b1b2b
update workflow
Alleria1809 c7d8639
update workflow
Alleria1809 69da375
update workflow
Alleria1809 05a6c37
update workflow
Alleria1809 09cda8b
update workflow
Alleria1809 f5e7136
update workflow
Alleria1809 074751a
update workflow
Alleria1809 6ed5ca3
update the model client
Alleria1809 f125a28
update the workdlow
Alleria1809 5242d8e
update the workdlow
Alleria1809 6d53843
Merge pull request #54 from SylphAI-Inc/main
liyin2015 65950c1
document clean up
liyin2015 75ab0ea
home page and developer notes
liyin2015 b7fa9ec
Merge pull request #56 from SylphAI-Inc/li
liyin2015 53ec44b
add author name
liyin2015 aedefb5
update workflow + update code with feedback
Alleria1809 b98fc11
use simple version to test
Alleria1809 ca0605c
add workflow
Alleria1809 fa73a97
remove the footnote
liyin2015 53c8e71
Merge pull request #57 from SylphAI-Inc/li
liyin2015 e8d1099
make the sidebar narrower
liyin2015 e708fb8
update intro page, add class hierarchy visualization
liyin2015 7d16121
Merge pull request #58 from SylphAI-Inc/li
liyin2015 ba824a7
add debug
Alleria1809 e2b8235
fix the grammar errors on intro page
liyin2015 ffeb73e
debug workflow
Alleria1809 2e24385
make the intro into the readme of the repo
liyin2015 a865b1d
debug workflow
Alleria1809 5e5e1d8
instruction on installation
liyin2015 29c78c1
install instruction
liyin2015 30138df
install instruction
liyin2015 86b2334
install instruction
liyin2015 1260642
Merge branch 'xiaoyi_doc' into li
liyin2015 2a5a260
Merge pull request #59 from SylphAI-Inc/li
liyin2015 501a5a9
debug workflow
Alleria1809 d873292
debug workflow
Alleria1809 de8f8fe
debug workflow
Alleria1809 a44b117
debug workflow
Alleria1809 f0a461a
debug workflow
Alleria1809 a8231e6
debug workflow
Alleria1809 fc4584b
debug workflow
Alleria1809 bf8adf5
rm the orginal text splitter
Alleria1809 b655780
update the tests
Alleria1809 80596b1
update the tests
Alleria1809 49bc698
update the tests
Alleria1809 5a2584a
update the tests
Alleria1809 384a88e
update the tests
Alleria1809 3ff872f
fix the images
Alleria1809 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,6 @@ | ||
OPENAI_API_KEY=YOUR_API_KEY_IF_YOU_USE_OPENAI | ||
GROQ_API_KEY=YOUR_API_KEY_IF_YOU_USE_GROQ | ||
ANTHROPIC_API_KEY=YOUR_API_KEY_IF_YOU_USE_ANTHROPIC | ||
GOOGLE_API_KEY=YOUR_API_KEY_IF_YOU_USE_GOOGLE | ||
COHERE_API_KEY=YOUR_API_KEY_IF_YOU_USE_COHERE | ||
HF_TOKEN=YOUR_API_KEY_IF_YOU_USE_HF |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
name: Documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
- xiaoyi_doc # Ensure this is the branch where you commit documentation updates | ||
|
||
permissions: | ||
contents: write | ||
actions: read | ||
|
||
jobs: | ||
build-and-deploy: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Install Poetry | ||
run: | | ||
curl -sSL https://install.python-poetry.org | python3 - | ||
echo "$HOME/.local/bin" >> $GITHUB_PATH | ||
|
||
- name: Install dependencies using Poetry | ||
run: | | ||
poetry config virtualenvs.create false | ||
poetry install | ||
|
||
- name: Build documentation using Makefile | ||
run: | | ||
echo "Building documentation from: $(pwd)" | ||
ls -l # Debug: List current directory contents | ||
poetry run make -C docs html | ||
working-directory: ${{ github.workspace }} | ||
|
||
- name: List built documentation | ||
run: | | ||
find ./build/ -type f | ||
working-directory: ${{ github.workspace }}/docs | ||
|
||
- name: Create .nojekyll file | ||
run: | | ||
touch .nojekyll | ||
working-directory: ${{ github.workspace }}/docs/build | ||
|
||
- name: Deploy to GitHub Pages | ||
uses: peaceiris/actions-gh-pages@v3 | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
publish_branch: gh-pages | ||
publish_dir: ./docs/build/ | ||
user_name: github-actions[bot] | ||
user_email: github-actions[bot]@users.noreply.github.com | ||
|
||
# - name: Debug Output | ||
# run: | | ||
# pwd # Print the current working directory | ||
# ls -l # List files in the build directory | ||
# cat ./source/conf.py # Show Sphinx config file for debugging | ||
# working-directory: ${{ github.workspace }}/docs/build |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# Introduction | ||
|
||
LightRAG is the `PyTorch` library for building large language model (LLM) applications. We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines. | ||
It is light, modular, and robust. | ||
|
||
**PyTorch** | ||
|
||
```python | ||
import torch | ||
import torch.nn as nn | ||
|
||
class Net(nn.Module): | ||
def __init__(self): | ||
super(Net, self).__init__() | ||
self.conv1 = nn.Conv2d(1, 32, 3, 1) | ||
self.conv2 = nn.Conv2d(32, 64, 3, 1) | ||
self.dropout1 = nn.Dropout2d(0.25) | ||
self.dropout2 = nn.Dropout2d(0.5) | ||
self.fc1 = nn.Linear(9216, 128) | ||
self.fc2 = nn.Linear(128, 10) | ||
|
||
def forward(self, x): | ||
x = self.conv1(x) | ||
x = self.conv2(x) | ||
x = self.dropout1(x) | ||
x = self.dropout2(x) | ||
x = self.fc1(x) | ||
return self.fc2(x) | ||
|
||
**LightRAG** | ||
|
||
```python | ||
|
||
from lightrag.core import Component, Generator | ||
from lightrag.components.model_client import GroqAPIClient | ||
from lightrag.utils import setup_env #noqa | ||
|
||
class SimpleQA(Component): | ||
def __init__(self): | ||
super().__init__() | ||
template = r"""<SYS> | ||
You are a helpful assistant. | ||
</SYS> | ||
User: {{input_str}} | ||
You: | ||
""" | ||
self.generator = Generator( | ||
model_client=GroqAPIClient(), | ||
model_kwargs={"model": "llama3-8b-8192"}, | ||
template=template, | ||
) | ||
|
||
def call(self, query): | ||
return self.generator({"input_str": query}) | ||
|
||
async def acall(self, query): | ||
return await self.generator.acall({"input_str": query}) | ||
``` | ||
|
||
## Simplicity | ||
|
||
Developers who are building real-world Large Language Model (LLM) applications are the real heroes. | ||
As a library, we provide them with the fundamental building blocks with 100% clarity and simplicity. | ||
|
||
* Two fundamental and powerful base classes: Component for the pipeline and DataClass for data interaction with LLMs. | ||
* We end up with less than two levels of subclasses. Class Hierarchy Visualization. | ||
* The result is a library with bare minimum abstraction, providing developers with maximum customizability. | ||
|
||
Similar to the PyTorch module, our Component provides excellent visualization of the pipeline structure. | ||
|
||
``` | ||
SimpleQA( | ||
(generator): Generator( | ||
model_kwargs={'model': 'llama3-8b-8192'}, | ||
(prompt): Prompt( | ||
template: <SYS> | ||
You are a helpful assistant. | ||
</SYS> | ||
User: {{input_str}} | ||
You: | ||
, prompt_variables: ['input_str'] | ||
) | ||
(model_client): GroqAPIClient() | ||
) | ||
) | ||
``` | ||
|
||
## Controllability | ||
|
||
Our simplicity did not come from doing 'less'. | ||
On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic to offer developers maximum control and robustness. | ||
|
||
* LLMs are sensitive to the prompt. We allow developers full control over their prompts without relying on API features such as tools and JSON format with components like Prompt, OutputParser, FunctionTool, and ToolManager. | ||
* Our goal is not to optimize for integration, but to provide a robust abstraction with representative examples. See this in ModelClient and Retriever. | ||
* All integrations, such as different API SDKs, are formed as optional packages but all within the same library. You can easily switch to any models from different providers that we officially support. | ||
|
||
## Future of LLM Applications | ||
|
||
On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions. | ||
Like how PyTorch has united both researchers and production teams, it enables smooth transition from research to production. | ||
With researchers building on LightRAG, production engineers can easily take over the method and test and iterate on their production data. | ||
Researchers will want their code to be adapted into more products too. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
Component,ListParser | ||
Component,JsonParser | ||
Component,YamlParser | ||
Component,ToolManager | ||
Component,Prompt | ||
Component,ModelClient | ||
Component,Retriever | ||
Component,FunctionTool | ||
Component,Tokenizer | ||
Component,Generator | ||
Component,Embedder | ||
Component,BatchEmbedder | ||
Component,Sequential | ||
Component,FunComponent | ||
Component,ReActAgent | ||
Component,OutputParser | ||
Component,TextSplitter | ||
Component,DocumentSplitter | ||
Component,ToEmbeddings | ||
Component,RetrieverOutputToContextStr | ||
Component,DefaultLLMJudge | ||
Component,LLMAugmenter | ||
Generic,LocalDB | ||
Generic,Retriever | ||
Generic,GeneratorOutput | ||
Generic,Parameter | ||
Generic,Sample | ||
Generic,Sampler | ||
Generic,RandomSampler | ||
Generic,ClassSampler | ||
ModelClient,CohereAPIClient | ||
ModelClient,TransformersClient | ||
ModelClient,GroqAPIClient | ||
ModelClient,GoogleGenAIClient | ||
ModelClient,OpenAIClient | ||
ModelClient,AnthropicAPIClient | ||
Retriever,BM25Retriever | ||
Retriever,PostgresRetriever | ||
Retriever,RerankerRetriever | ||
Retriever,LLMRetriever | ||
Retriever,FAISSRetriever | ||
Enum,DataClassFormatType | ||
Enum,ModelType | ||
Enum,DistanceToOperator | ||
Enum,OptionalPackages | ||
DataClass,EmbedderOutput | ||
DataClass,GeneratorOutput | ||
DataClass,RetrieverOutput | ||
DataClass,FunctionDefinition | ||
DataClass,Function | ||
DataClass,FunctionExpression | ||
DataClass,FunctionOutput | ||
DataClass,StepOutput | ||
DataClass,Document | ||
DataClass,DialogTurn | ||
DataClass,Instruction | ||
DataClass,GeneratorStatesRecord | ||
DataClass,GeneratorCallRecord | ||
Generator,CoTGenerator | ||
Generator,CoTGeneratorWithJsonOutput | ||
OutputParser,YamlOutputParser | ||
OutputParser,JsonOutputParser | ||
OutputParser,ListOutputParser | ||
OutputParser,BooleanOutputParser | ||
Optimizer,BootstrapFewShot | ||
Optimizer,LLMOptimizer | ||
Sampler,RandomSampler | ||
Sampler,ClassSampler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
from lightrag.core import Component, Generator | ||
from lightrag.components.model_client import GroqAPIClient | ||
from lightrag.utils import setup_env # noqa | ||
|
||
|
||
class SimpleQA(Component): | ||
def __init__(self): | ||
super().__init__() | ||
template = r"""<SYS> | ||
You are a helpful assistant. | ||
</SYS> | ||
User: {{input_str}} | ||
You: | ||
""" | ||
self.generator = Generator( | ||
model_client=GroqAPIClient(), | ||
model_kwargs={"model": "llama3-8b-8192"}, | ||
template=template, | ||
) | ||
|
||
def call(self, query): | ||
return self.generator({"input_str": query}) | ||
|
||
async def acall(self, query): | ||
return await self.generator.acall({"input_str": query}) | ||
|
||
|
||
qa = SimpleQA() | ||
answer = qa("What is LightRAG?") | ||
print(qa) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,11 @@ | ||
pydata-sphinx-theme==0.15.2 | ||
Sphinx==7.3.7 | ||
sphinx_design==0.6.0 | ||
sphinx-copybutton==0.5.2 | ||
pydata-sphinx-theme==0.15.3 | ||
sphinx-design==0.6.0 | ||
sphinx-copybutton==0.5.2 | ||
sphinx==7.3.7 | ||
nbsphinx==0.9.4 | ||
nbconvert==7.16.4 | ||
PyYAML | ||
readthedocs-sphinx-search==0.3.2 | ||
numpy | ||
tqdm | ||
tiktoken |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets create a release branch, we need to start to know how to manage different release