-
Notifications
You must be signed in to change notification settings - Fork 217
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #52 from SylphAI-Inc/xiaoyi_doc
Update textsplitter & fix documents
- Loading branch information
Showing
42 changed files
with
1,949 additions
and
1,076 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,6 @@ | ||
OPENAI_API_KEY=YOUR_API_KEY_IF_YOU_USE_OPENAI | ||
GROQ_API_KEY=YOUR_API_KEY_IF_YOU_USE_GROQ | ||
ANTHROPIC_API_KEY=YOUR_API_KEY_IF_YOU_USE_ANTHROPIC | ||
GOOGLE_API_KEY=YOUR_API_KEY_IF_YOU_USE_GOOGLE | ||
COHERE_API_KEY=YOUR_API_KEY_IF_YOU_USE_COHERE | ||
HF_TOKEN=YOUR_API_KEY_IF_YOU_USE_HF |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
name: Documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
- xiaoyi_doc # Ensure this is the branch where you commit documentation updates | ||
|
||
permissions: | ||
contents: write | ||
actions: read | ||
|
||
jobs: | ||
build-and-deploy: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v4 | ||
with: | ||
fetch-depth: 0 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: '3.11' | ||
|
||
- name: Install Poetry | ||
run: | | ||
curl -sSL https://install.python-poetry.org | python3 - | ||
echo "$HOME/.local/bin" >> $GITHUB_PATH | ||
- name: Install dependencies using Poetry | ||
run: | | ||
poetry config virtualenvs.create false | ||
poetry install | ||
- name: Build documentation using Makefile | ||
run: | | ||
echo "Building documentation from: $(pwd)" | ||
ls -l # Debug: List current directory contents | ||
poetry run make -C docs html | ||
working-directory: ${{ github.workspace }} | ||
|
||
- name: List built documentation | ||
run: | | ||
find ./build/ -type f | ||
working-directory: ${{ github.workspace }}/docs | ||
|
||
- name: Create .nojekyll file | ||
run: | | ||
touch .nojekyll | ||
working-directory: ${{ github.workspace }}/docs/build | ||
|
||
- name: Deploy to GitHub Pages | ||
uses: peaceiris/actions-gh-pages@v3 | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
publish_branch: gh-pages | ||
publish_dir: ./docs/build/ | ||
user_name: github-actions[bot] | ||
user_email: github-actions[bot]@users.noreply.github.com | ||
|
||
# - name: Debug Output | ||
# run: | | ||
# pwd # Print the current working directory | ||
# ls -l # List files in the build directory | ||
# cat ./source/conf.py # Show Sphinx config file for debugging | ||
# working-directory: ${{ github.workspace }}/docs/build |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# Introduction | ||
|
||
LightRAG is the `PyTorch` library for building large language model (LLM) applications. We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines. | ||
It is light, modular, and robust. | ||
|
||
**PyTorch** | ||
|
||
```python | ||
import torch | ||
import torch.nn as nn | ||
|
||
class Net(nn.Module): | ||
def __init__(self): | ||
super(Net, self).__init__() | ||
self.conv1 = nn.Conv2d(1, 32, 3, 1) | ||
self.conv2 = nn.Conv2d(32, 64, 3, 1) | ||
self.dropout1 = nn.Dropout2d(0.25) | ||
self.dropout2 = nn.Dropout2d(0.5) | ||
self.fc1 = nn.Linear(9216, 128) | ||
self.fc2 = nn.Linear(128, 10) | ||
|
||
def forward(self, x): | ||
x = self.conv1(x) | ||
x = self.conv2(x) | ||
x = self.dropout1(x) | ||
x = self.dropout2(x) | ||
x = self.fc1(x) | ||
return self.fc2(x) | ||
|
||
**LightRAG** | ||
|
||
```python | ||
|
||
from lightrag.core import Component, Generator | ||
from lightrag.components.model_client import GroqAPIClient | ||
from lightrag.utils import setup_env #noqa | ||
|
||
class SimpleQA(Component): | ||
def __init__(self): | ||
super().__init__() | ||
template = r"""<SYS> | ||
You are a helpful assistant. | ||
</SYS> | ||
User: {{input_str}} | ||
You: | ||
""" | ||
self.generator = Generator( | ||
model_client=GroqAPIClient(), | ||
model_kwargs={"model": "llama3-8b-8192"}, | ||
template=template, | ||
) | ||
|
||
def call(self, query): | ||
return self.generator({"input_str": query}) | ||
|
||
async def acall(self, query): | ||
return await self.generator.acall({"input_str": query}) | ||
``` | ||
|
||
## Simplicity | ||
|
||
Developers who are building real-world Large Language Model (LLM) applications are the real heroes. | ||
As a library, we provide them with the fundamental building blocks with 100% clarity and simplicity. | ||
|
||
* Two fundamental and powerful base classes: Component for the pipeline and DataClass for data interaction with LLMs. | ||
* We end up with less than two levels of subclasses. Class Hierarchy Visualization. | ||
* The result is a library with bare minimum abstraction, providing developers with maximum customizability. | ||
|
||
Similar to the PyTorch module, our Component provides excellent visualization of the pipeline structure. | ||
|
||
``` | ||
SimpleQA( | ||
(generator): Generator( | ||
model_kwargs={'model': 'llama3-8b-8192'}, | ||
(prompt): Prompt( | ||
template: <SYS> | ||
You are a helpful assistant. | ||
</SYS> | ||
User: {{input_str}} | ||
You: | ||
, prompt_variables: ['input_str'] | ||
) | ||
(model_client): GroqAPIClient() | ||
) | ||
) | ||
``` | ||
|
||
## Controllability | ||
|
||
Our simplicity did not come from doing 'less'. | ||
On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic to offer developers maximum control and robustness. | ||
|
||
* LLMs are sensitive to the prompt. We allow developers full control over their prompts without relying on API features such as tools and JSON format with components like Prompt, OutputParser, FunctionTool, and ToolManager. | ||
* Our goal is not to optimize for integration, but to provide a robust abstraction with representative examples. See this in ModelClient and Retriever. | ||
* All integrations, such as different API SDKs, are formed as optional packages but all within the same library. You can easily switch to any models from different providers that we officially support. | ||
|
||
## Future of LLM Applications | ||
|
||
On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions. | ||
Like how PyTorch has united both researchers and production teams, it enables smooth transition from research to production. | ||
With researchers building on LightRAG, production engineers can easily take over the method and test and iterate on their production data. | ||
Researchers will want their code to be adapted into more products too. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
Component,ListParser | ||
Component,JsonParser | ||
Component,YamlParser | ||
Component,ToolManager | ||
Component,Prompt | ||
Component,ModelClient | ||
Component,Retriever | ||
Component,FunctionTool | ||
Component,Tokenizer | ||
Component,Generator | ||
Component,Embedder | ||
Component,BatchEmbedder | ||
Component,Sequential | ||
Component,FunComponent | ||
Component,ReActAgent | ||
Component,OutputParser | ||
Component,TextSplitter | ||
Component,DocumentSplitter | ||
Component,ToEmbeddings | ||
Component,RetrieverOutputToContextStr | ||
Component,DefaultLLMJudge | ||
Component,LLMAugmenter | ||
Generic,LocalDB | ||
Generic,Retriever | ||
Generic,GeneratorOutput | ||
Generic,Parameter | ||
Generic,Sample | ||
Generic,Sampler | ||
Generic,RandomSampler | ||
Generic,ClassSampler | ||
ModelClient,CohereAPIClient | ||
ModelClient,TransformersClient | ||
ModelClient,GroqAPIClient | ||
ModelClient,GoogleGenAIClient | ||
ModelClient,OpenAIClient | ||
ModelClient,AnthropicAPIClient | ||
Retriever,BM25Retriever | ||
Retriever,PostgresRetriever | ||
Retriever,RerankerRetriever | ||
Retriever,LLMRetriever | ||
Retriever,FAISSRetriever | ||
Enum,DataClassFormatType | ||
Enum,ModelType | ||
Enum,DistanceToOperator | ||
Enum,OptionalPackages | ||
DataClass,EmbedderOutput | ||
DataClass,GeneratorOutput | ||
DataClass,RetrieverOutput | ||
DataClass,FunctionDefinition | ||
DataClass,Function | ||
DataClass,FunctionExpression | ||
DataClass,FunctionOutput | ||
DataClass,StepOutput | ||
DataClass,Document | ||
DataClass,DialogTurn | ||
DataClass,Instruction | ||
DataClass,GeneratorStatesRecord | ||
DataClass,GeneratorCallRecord | ||
Generator,CoTGenerator | ||
Generator,CoTGeneratorWithJsonOutput | ||
OutputParser,YamlOutputParser | ||
OutputParser,JsonOutputParser | ||
OutputParser,ListOutputParser | ||
OutputParser,BooleanOutputParser | ||
Optimizer,BootstrapFewShot | ||
Optimizer,LLMOptimizer | ||
Sampler,RandomSampler | ||
Sampler,ClassSampler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
from lightrag.core import Component, Generator | ||
from lightrag.components.model_client import GroqAPIClient | ||
from lightrag.utils import setup_env # noqa | ||
|
||
|
||
class SimpleQA(Component): | ||
def __init__(self): | ||
super().__init__() | ||
template = r"""<SYS> | ||
You are a helpful assistant. | ||
</SYS> | ||
User: {{input_str}} | ||
You: | ||
""" | ||
self.generator = Generator( | ||
model_client=GroqAPIClient(), | ||
model_kwargs={"model": "llama3-8b-8192"}, | ||
template=template, | ||
) | ||
|
||
def call(self, query): | ||
return self.generator({"input_str": query}) | ||
|
||
async def acall(self, query): | ||
return await self.generator.acall({"input_str": query}) | ||
|
||
|
||
qa = SimpleQA() | ||
answer = qa("What is LightRAG?") | ||
print(qa) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,11 @@ | ||
pydata-sphinx-theme==0.15.2 | ||
Sphinx==7.3.7 | ||
sphinx_design==0.6.0 | ||
sphinx-copybutton==0.5.2 | ||
pydata-sphinx-theme==0.15.3 | ||
sphinx-design==0.6.0 | ||
sphinx-copybutton==0.5.2 | ||
sphinx==7.3.7 | ||
nbsphinx==0.9.4 | ||
nbconvert==7.16.4 | ||
PyYAML | ||
readthedocs-sphinx-search==0.3.2 | ||
numpy | ||
tqdm | ||
tiktoken |
Oops, something went wrong.