Merge pull request #52 from SylphAI-Inc/xiaoyi_doc

Update textsplitter & fix documents
SylphAI-Inc · Jun 30, 2024 · 5a0e51f · 5a0e51f
2 parents 65d0bdf + 3ff872f
commit 5a0e51f
Show file tree

Hide file tree

Showing 42 changed files with 1,949 additions and 1,076 deletions.
diff --git a/.env_example b/.env_example
@@ -1,2 +1,6 @@
 OPENAI_API_KEY=YOUR_API_KEY_IF_YOU_USE_OPENAI
 GROQ_API_KEY=YOUR_API_KEY_IF_YOU_USE_GROQ
+ANTHROPIC_API_KEY=YOUR_API_KEY_IF_YOU_USE_ANTHROPIC
+GOOGLE_API_KEY=YOUR_API_KEY_IF_YOU_USE_GOOGLE
+COHERE_API_KEY=YOUR_API_KEY_IF_YOU_USE_COHERE
+HF_TOKEN=YOUR_API_KEY_IF_YOU_USE_HF
diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml
@@ -0,0 +1,68 @@
+name: Documentation
+
+on:
+  push:
+    branches:
+      - xiaoyi_doc  # Ensure this is the branch where you commit documentation updates
+
+permissions:
+  contents: write
+  actions: read
+
+jobs:
+  build-and-deploy:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install Poetry
+        run: |
+          curl -sSL https://install.python-poetry.org | python3 -
+          echo "$HOME/.local/bin" >> $GITHUB_PATH
+
+      - name: Install dependencies using Poetry
+        run: |
+          poetry config virtualenvs.create false
+          poetry install
+
+      - name: Build documentation using Makefile
+        run: |
+          echo "Building documentation from: $(pwd)"
+          ls -l  # Debug: List current directory contents
+          poetry run make -C docs html
+        working-directory: ${{ github.workspace }}
+
+      - name: List built documentation
+        run: |
+          find ./build/ -type f
+        working-directory: ${{ github.workspace }}/docs
+
+      - name: Create .nojekyll file
+        run: |
+          touch .nojekyll
+        working-directory: ${{ github.workspace }}/docs/build
+
+      - name: Deploy to GitHub Pages
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_branch: gh-pages
+          publish_dir: ./docs/build/
+          user_name: github-actions[bot]
+          user_email: github-actions[bot]@users.noreply.github.com
+
+      # - name: Debug Output
+      #   run: |
+      #     pwd  # Print the current working directory
+      #     ls -l  # List files in the build directory
+      #     cat ./source/conf.py  # Show Sphinx config file for debugging
+      #   working-directory: ${{ github.workspace }}/docs/build
diff --git a/README.md b/README.md
@@ -0,0 +1,102 @@
+# Introduction
+
+LightRAG is the `PyTorch` library for building large language model (LLM) applications. We help developers with both building and optimizing `Retriever`-`Agent`-`Generator` (RAG) pipelines.
+It is light, modular, and robust.
+
+**PyTorch**
+
+```python
+import torch
+import torch.nn as nn
+
+class Net(nn.Module):
+   def __init__(self):
+      super(Net, self).__init__()
+      self.conv1 = nn.Conv2d(1, 32, 3, 1)
+      self.conv2 = nn.Conv2d(32, 64, 3, 1)
+      self.dropout1 = nn.Dropout2d(0.25)
+      self.dropout2 = nn.Dropout2d(0.5)
+      self.fc1 = nn.Linear(9216, 128)
+      self.fc2 = nn.Linear(128, 10)
+
+   def forward(self, x):
+      x = self.conv1(x)
+      x = self.conv2(x)
+      x = self.dropout1(x)
+      x = self.dropout2(x)
+      x = self.fc1(x)
+      return self.fc2(x)
+
+**LightRAG**
+
+```python
+
+from lightrag.core import Component, Generator
+from lightrag.components.model_client import GroqAPIClient
+from lightrag.utils import setup_env #noqa
+
+class SimpleQA(Component):
+   def __init__(self):
+      super().__init__()
+      template = r"""<SYS>
+      You are a helpful assistant.
+      </SYS>
+      User: {{input_str}}
+      You:
+      """
+      self.generator = Generator(
+            model_client=GroqAPIClient(),
+            model_kwargs={"model": "llama3-8b-8192"},
+            template=template,
+      )
+
+   def call(self, query):
+      return self.generator({"input_str": query})
+
+   async def acall(self, query):
+      return await self.generator.acall({"input_str": query})
+```
+
+## Simplicity
+
+Developers who are building real-world Large Language Model (LLM) applications are the real heroes.
+As a library, we provide them with the fundamental building blocks with 100% clarity and simplicity.
+
+* Two fundamental and powerful base classes: Component for the pipeline and DataClass for data interaction with LLMs.
+* We end up with less than two levels of subclasses. Class Hierarchy Visualization.
+* The result is a library with bare minimum abstraction, providing developers with maximum customizability.
+
+Similar to the PyTorch module, our Component provides excellent visualization of the pipeline structure.
+
+```
+SimpleQA(
+   (generator): Generator(
+      model_kwargs={'model': 'llama3-8b-8192'},
+      (prompt): Prompt(
+         template: <SYS>
+               You are a helpful assistant.
+               </SYS>
+               User: {{input_str}}
+               You:
+               , prompt_variables: ['input_str']
+      )
+      (model_client): GroqAPIClient()
+   )
+)
+```
+
+## Controllability
+
+Our simplicity did not come from doing 'less'.
+On the contrary, we have to do 'more' and go 'deeper' and 'wider' on any topic to offer developers maximum control and robustness.
+
+* LLMs are sensitive to the prompt. We allow developers full control over their prompts without relying on API features such as tools and JSON format with components like Prompt, OutputParser, FunctionTool, and ToolManager.
+* Our goal is not to optimize for integration, but to provide a robust abstraction with representative examples. See this in ModelClient and Retriever.
+* All integrations, such as different API SDKs, are formed as optional packages but all within the same library. You can easily switch to any models from different providers that we officially support.
+
+## Future of LLM Applications
+
+On top of the easiness to use, we in particular optimize the configurability of components for researchers to build their solutions and to benchmark existing solutions.
+Like how PyTorch has united both researchers and production teams, it enables smooth transition from research to production.
+With researchers building on LightRAG, production engineers can easily take over the method and test and iterate on their production data.
+Researchers will want their code to be adapted into more products too.
diff --git a/class_hierarchy_edges.csv b/class_hierarchy_edges.csv
@@ -0,0 +1,68 @@
+Component,ListParser
+Component,JsonParser
+Component,YamlParser
+Component,ToolManager
+Component,Prompt
+Component,ModelClient
+Component,Retriever
+Component,FunctionTool
+Component,Tokenizer
+Component,Generator
+Component,Embedder
+Component,BatchEmbedder
+Component,Sequential
+Component,FunComponent
+Component,ReActAgent
+Component,OutputParser
+Component,TextSplitter
+Component,DocumentSplitter
+Component,ToEmbeddings
+Component,RetrieverOutputToContextStr
+Component,DefaultLLMJudge
+Component,LLMAugmenter
+Generic,LocalDB
+Generic,Retriever
+Generic,GeneratorOutput
+Generic,Parameter
+Generic,Sample
+Generic,Sampler
+Generic,RandomSampler
+Generic,ClassSampler
+ModelClient,CohereAPIClient
+ModelClient,TransformersClient
+ModelClient,GroqAPIClient
+ModelClient,GoogleGenAIClient
+ModelClient,OpenAIClient
+ModelClient,AnthropicAPIClient
+Retriever,BM25Retriever
+Retriever,PostgresRetriever
+Retriever,RerankerRetriever
+Retriever,LLMRetriever
+Retriever,FAISSRetriever
+Enum,DataClassFormatType
+Enum,ModelType
+Enum,DistanceToOperator
+Enum,OptionalPackages
+DataClass,EmbedderOutput
+DataClass,GeneratorOutput
+DataClass,RetrieverOutput
+DataClass,FunctionDefinition
+DataClass,Function
+DataClass,FunctionExpression
+DataClass,FunctionOutput
+DataClass,StepOutput
+DataClass,Document
+DataClass,DialogTurn
+DataClass,Instruction
+DataClass,GeneratorStatesRecord
+DataClass,GeneratorCallRecord
+Generator,CoTGenerator
+Generator,CoTGeneratorWithJsonOutput
+OutputParser,YamlOutputParser
+OutputParser,JsonOutputParser
+OutputParser,ListOutputParser
+OutputParser,BooleanOutputParser
+Optimizer,BootstrapFewShot
+Optimizer,LLMOptimizer
+Sampler,RandomSampler
+Sampler,ClassSampler
diff --git a/developer_notes/generator.ipynb b/developer_notes/generator.ipynb
@@ -74,10 +74,48 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "GeneratorOutput(data='LightRAG is a light-based Real-time Anomaly Generator, which is a special type of anomaly detection system. It uses a combination of visual and statistical techniques to detect unusual patterns or outliers in a dataset in real-time, often for purposes such as identifying security threats, detecting fraud, or monitoring system performance. Would you like to know more about its applications or how it works?', error=None, usage=None, raw_response='LightRAG is a light-based Real-time Anomaly Generator, which is a special type of anomaly detection system. It uses a combination of visual and statistical techniques to detect unusual patterns or outliers in a dataset in real-time, often for purposes such as identifying security threats, detecting fraud, or monitoring system performance. Would you like to know more about its applications or how it works?')\n"
+     ]
+    }
+   ],
+   "source": [
+    "from lightrag.core import Component, Generator, Prompt\n",
+    "from lightrag.components.model_client import GroqAPIClient\n",
+    "from lightrag.utils import setup_env\n",
+    "\n",
+    "\n",
+    "class SimpleQA(Component):\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        template = r\"\"\"<SYS>\n",
+    "        You are a helpful assistant.\n",
+    "        </SYS>\n",
+    "        User: {{input_str}}\n",
+    "        You:\n",
+    "        \"\"\"\n",
+    "        self.generator = Generator(\n",
+    "            model_client=GroqAPIClient(), model_kwargs={\"model\": \"llama3-8b-8192\"}, template=template\n",
+    "        )\n",
+    "\n",
+    "    def call(self, query):\n",
+    "        return self.generator({\"input_str\": query})\n",
+    "\n",
+    "    async def acall(self, query):\n",
+    "        return await self.generator.acall({\"input_str\": query})\n",
+    "\n",
+    "\n",
+    "qa = SimpleQA()\n",
+    "answer = qa(\"What is LightRAG?\")\n",
+    "\n",
+    "print(answer)"
+   ]
   }
  ],
  "metadata": {

diff --git a/developer_notes/generator_note.py b/developer_notes/generator_note.py
@@ -0,0 +1,30 @@
+from lightrag.core import Component, Generator
+from lightrag.components.model_client import GroqAPIClient
+from lightrag.utils import setup_env  # noqa
+
+
+class SimpleQA(Component):
+    def __init__(self):
+        super().__init__()
+        template = r"""<SYS>
+        You are a helpful assistant.
+        </SYS>
+        User: {{input_str}}
+        You:
+        """
+        self.generator = Generator(
+            model_client=GroqAPIClient(),
+            model_kwargs={"model": "llama3-8b-8192"},
+            template=template,
+        )
+
+    def call(self, query):
+        return self.generator({"input_str": query})
+
+    async def acall(self, query):
+        return await self.generator.acall({"input_str": query})
+
+
+qa = SimpleQA()
+answer = qa("What is LightRAG?")
+print(qa)
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,4 +1,11 @@
-pydata-sphinx-theme==0.15.2
-Sphinx==7.3.7
-sphinx_design==0.6.0
-sphinx-copybutton==0.5.2
+pydata-sphinx-theme==0.15.3
+sphinx-design==0.6.0
+sphinx-copybutton==0.5.2
+sphinx==7.3.7
+nbsphinx==0.9.4
+nbconvert==7.16.4
+PyYAML
+readthedocs-sphinx-search==0.3.2
+numpy
+tqdm
+tiktoken