Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize code structure #94

Merged
merged 1 commit into from
Nov 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
@@ -21,6 +21,5 @@ jobs:
- name: Python pylint
run: |
pip install pylint==2.10.2
pylint --rcfile=.pylintrc --output-format=colorized src_towhee
pylint --rcfile=.pylintrc --output-format=colorized src_langchain
pylint --rcfile=.pylintrc --output-format=colorized src
pylint --rcfile=.pylintrc --output-format=colorized offline_tools
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
**/__pycache__
**/tmp
**/*.egg-info
**/*.db
**/build
4 changes: 2 additions & 2 deletions Contributing.md
Original file line number Diff line number Diff line change
@@ -65,8 +65,8 @@ If you're interested in contributing to the `zilliztech/akcio` codebase, follow
4. During development, you might want to run `pylint`. You can do so with one of the commands below:
```bash
$ pip install pylint==2.10.2
$ pylint --rcfile=.pylintrc --output-format=colorized src_towhee
$ pylint --rcfile=.pylintrc --output-format=colorized src_langchain
$ pylint --rcfile=.pylintrc --output-format=colorized src.towhee
$ pylint --rcfile=.pylintrc --output-format=colorized src.langchain
$ pylint --rcfile=.pylintrc --output-format=colorized offline_tools
```

20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -71,34 +71,34 @@ It also supports different integrations of LLM service and databases:

The option using Towhee simplifies the process of building a system by providing [pre-defined pipelines](https://towhee.io/tasks/pipeline). These built-in pipelines require less coding and make system building much easier. If you require customization, you can either simply modify configuration or create your own pipeline with rich options of [Towhee Operators](https://towhee.io/tasks/operator).

- [Pipelines](./src_towhee/pipelines)
- [Pipelines](./src.towhee/pipelines)
- **Insert:**
The insert pipeline builds a knowledge base by saving documents and corresponding data in database(s).
- **Search:**
The search pipeline enables the question-answering capability powered by information retrieval (semantic search and optional keyword match) and LLM service.
- **Prompt:** a prompt operator prepares messages for LLM by assembling system message, chat history, and the user's query processed by template.

- [Memory](./src_towhee/memory):
The memory storage stores chat history to support context in conversation. (available: [most SQL](./src_towhee/memory/sql.py))
- [Memory](./src.towhee/memory):
The memory storage stores chat history to support context in conversation. (available: [most SQL](./src.towhee/memory/sql.py))


### Option 2: LangChain

The option using LangChain employs the use of [Agent](https://python.langchain.com/docs/modules/agents) in order to enable LLM to utilize specific tools, resulting in a greater demand for LLM's ability to comprehend tasks and make informed decisions.

- [Agent](./src_langchain/agent)
- [Agent](./src.langchain/agent)
- **ChatAgent:** agent ensembles all modules together to build up qa system.
- Other agents (todo)
- [LLM](./src_langchain/llm)
- [LLM](./src.langchain/llm)
- **ChatLLM:** large language model or service to generate answers.
- [Embedding](./src_langchain/embedding/)
- [Embedding](./src.langchain/embedding/)
- **TextEncoder:** encoder converts each text input to a vector.
- Other encoders (todo)
- [Store](./src_langchain/store)
- [Store](./src.langchain/store)
- **VectorStore:** vector database stores document chunks in embeddings, and performs document retrieval via semantic search.
- **ScalarStore:** optional, database stores metadata for each document chunk, which supports additional information retrieval. (available: [Elastic](src_langchain/store/scalar_store/es.py))
- **ScalarStore:** optional, database stores metadata for each document chunk, which supports additional information retrieval. (available: [Elastic](src.langchain/store/scalar_store/es.py))
- **MemoryStore:** memory storage stores chat history to support context in conversation.
- [DataLoader](./src_langchain/data_loader/)
- [DataLoader](./src.langchain/data_loader/)
- **DataParser:** tool loads data from given source and then splits documents into processed doc chunks.

## Deployment
@@ -228,7 +228,7 @@ The option using LangChain employs the use of [Agent](https://python.langchain.c

## Load data

The `insert` function in [operations](./src_langchain/operations.py) loads project data from url(s) or file(s).
The `insert` function in [operations](./src.langchain/operations.py) loads project data from url(s) or file(s).

There are 2 options to load project data:

10 changes: 5 additions & 5 deletions config.py
Original file line number Diff line number Diff line change
@@ -115,7 +115,7 @@
raise NotImplementedError

RERANK_CONFIG = {
'rerank': True, # or False
'rerank': False, # or False
'rerank_model': rerank_model,
'threshold': 0.0,
'rerank_device': -1 # -1 will use cpu
@@ -126,7 +126,7 @@
'chunk_size': 300
}

QUESTIONGENERATOR_CONFIG = {
'model_name': 'gpt-3.5-turbo',
'temperature': 0,
}
# QUESTIONGENERATOR_CONFIG = {
# 'model_name': 'gpt-3.5-turbo',
# 'temperature': 0,
# }
File renamed without changes.
4 changes: 2 additions & 2 deletions gradio_demo.py
Original file line number Diff line number Diff line change
@@ -17,9 +17,9 @@
'The service should start with either "--langchain" or "--towhee".'

if USE_LANGCHAIN:
from src_langchain.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413
from src.langchain.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413
if USE_TOWHEE:
from src_towhee.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413
from src.towhee.operations import chat, insert, check, drop, get_history, clear_history, count # pylint: disable=C0413


def create_session_id():
4 changes: 2 additions & 2 deletions main.py
Original file line number Diff line number Diff line change
@@ -40,10 +40,10 @@
'The service should start with either "--langchain" or "--towhee".'

if USE_LANGCHAIN:
from src_langchain.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
from src.langchain.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
chat = partial(chat, enable_agent=ENABLE_AGENT)
if USE_TOWHEE:
from src_towhee.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
from src.towhee.operations import chat, insert, drop, check, get_history, clear_history, count # pylint: disable=C0413
if ENABLE_MONITER:
from moniter import enable_moniter # pylint: disable=C0413
from prometheus_client import generate_latest, REGISTRY # pylint: disable=C0413
2 changes: 1 addition & 1 deletion offline_tools/insert.py
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src_langchain.embedding import TextEncoder # pylint: disable=C0413
from src.langchain.embedding import TextEncoder # pylint: disable=C0413
from offline_tools.generator_questions import get_output_csv # pylint: disable=C0413
from offline_tools.utils.stackoverflow_json2csv import stackoverflow_json2csv # pylint: disable=C0413
from offline_tools.utils.load_npy import langchain_load # pylint: disable=C0413
2 changes: 1 addition & 1 deletion offline_tools/utils/load_npy.py
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src_langchain.store import DocStore # pylint: disable=C0413
from src.langchain.store import DocStore # pylint: disable=C0413


class DBReader(object):
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -4,11 +4,12 @@ pexpect
pdf2image
SQLAlchemy>=2.0.15
psycopg2-binary
openai
openai==0.28
gradio>=3.30.0
fastapi
uvicorn
towhee>=1.1.0
pydantic<2.0
pymilvus
elasticsearch>=8.0.0
prometheus-client
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -47,8 +47,7 @@ agent = ChatAgent.from_llm_and_tools(
# Define a chain
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
verbose=False
tools=tools
)

# Run a test
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -43,7 +43,6 @@ def chat(session_id, project, question, enable_agent=False):
agent=agent,
tools=tools,
memory=memory_db.memory,
verbose=False
)
try:
final_answer = agent_chain.run(input=question)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion src_towhee/memory/sql.py → src/towhee/memory/sql.py
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))

from src_towhee.base import BaseMemory # pylint: disable=C0413
from src.towhee.base import BaseMemory # pylint: disable=C0413
from config import MEMORYDB_CONFIG # pylint: disable=C0413


4 changes: 2 additions & 2 deletions src_towhee/operations.py → src/towhee/operations.py
Original file line number Diff line number Diff line change
@@ -4,8 +4,8 @@

sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

from src_towhee.pipelines import TowheePipelines # pylint: disable=C0413
from src_towhee.memory import MemoryStore # pylint: disable=C0413
from src.towhee.pipelines import TowheePipelines # pylint: disable=C0413
from src.towhee.memory import MemoryStore # pylint: disable=C0413


logger = logging.getLogger(__name__)
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -14,9 +14,9 @@
RERANK_CONFIG, QUERY_MODE, INSERT_MODE,
DATAPARSER_CONFIG
)
from src_towhee.base import BasePipelines # pylint: disable=C0413
from src_towhee.pipelines.search import build_search_pipeline # pylint: disable=C0413
from src_towhee.pipelines.insert import build_insert_pipeline # pylint: disable=C0413
from src.towhee.base import BasePipelines # pylint: disable=C0413
from src.towhee.pipelines.search import build_search_pipeline # pylint: disable=C0413
from src.towhee.pipelines.insert import build_insert_pipeline # pylint: disable=C0413


class TowheePipelines(BasePipelines):
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
import os
import sys
import unittest

from langchain.agents import AgentExecutor, Tool
from langchain.llms.fake import FakeListLLM

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.agent import ChatAgent
from src.langchain.agent import ChatAgent


class TestChatAgent(unittest.TestCase):
@@ -25,8 +20,7 @@ class TestChatAgent(unittest.TestCase):
def test_run_chat_agent(self):
agent_executor = AgentExecutor.from_agent_and_tools(
agent=self.chat_agent,
tools=self.tools,
verbose=False
tools=self.tools
)
final_answer = agent_executor.run(input='whats 2 + 2', chat_history=[])
assert final_answer == self.responses[1]
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
import os
import sys
import unittest

from langchain.schema import AgentAction, AgentFinish

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.agent.prompt import FORMAT_INSTRUCTIONS
from src_langchain.agent.output_parser import OutputParser
from src.langchain.agent.prompt import FORMAT_INSTRUCTIONS
from src.langchain.agent.output_parser import OutputParser


class TestOutputParser(unittest.TestCase):
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
import io
import os
import sys
import tempfile
import unittest
from unittest.mock import patch

from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.data_loader import DataParser
from src.langchain.data_loader import DataParser


class TestDataParser(unittest.TestCase):
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
import os
import sys
import unittest

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_langchain.data_loader.data_splitter import MarkDownSplitter
from src.langchain.data_loader.data_splitter import MarkDownSplitter


class TestMarkDownSplitter(unittest.TestCase):
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
import os
import sys
import unittest
from unittest.mock import patch

import numpy as np

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))
from src_langchain.embedding.langchain_huggingface import TextEncoder
from src.langchain.embedding.langchain_huggingface import TextEncoder


class TestLangchainHuggingface(unittest.TestCase):
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
import os
import sys
import unittest
from unittest.mock import patch

import numpy as np

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))
from src_langchain.embedding.openai_embedding import TextEncoder
from src.langchain.embedding.openai_embedding import TextEncoder


class TestOpenAIEmbedding(unittest.TestCase):
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
import os
import sys
import unittest
from unittest.mock import patch

from langchain.schema import HumanMessage

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))

MOCK_ANSWER = 'mock answer'

@@ -18,7 +15,7 @@ def __call__(self, prompt):

with patch('transformers.pipeline') as mock_pipelines:
mock_pipelines.return_value = MockGenerateText()
from src_langchain.llm.dolly_chat import ChatLLM
from src.langchain.llm.dolly_chat import ChatLLM

chat_llm = ChatLLM(model_name='mock', device='cpu', )
messages = [HumanMessage(content='hello')]
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
import os
import sys
import unittest
from unittest.mock import patch
from langchain.schema import HumanMessage, AIMessage

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))


class TestERNIE(unittest.TestCase):
def test_generate(self):
@@ -27,7 +23,7 @@ def test_generate(self):
)
mock_post.return_value = mock_res

from src_langchain.llm.ernie import ChatLLM
from src.langchain.llm.ernie import ChatLLM

EB_API_TYPE = 'mock_type'
EB_ACCESS_TOKEN = 'mock_token'
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
import os
import sys
import unittest

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../../..'))


class TestOpenAIChat(unittest.TestCase):
def test_init(self):
from src_langchain.llm.openai_chat import ChatLLM
from src.langchain.llm.openai_chat import ChatLLM
chat_llm = ChatLLM(openai_api_key='mock-key')
self.assertEqual(chat_llm.__class__.__name__, 'ChatLLM')

Empty file.
1 change: 1 addition & 0 deletions tests/unit_tests/src/towhee/akcio_ut.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is test content.
Empty file.
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
import os
import sys
import unittest

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from src_towhee.base import BaseMemory # pylint: disable=C0413
from src_towhee.memory.sql import MemoryStore # pylint: disable=C0413
from src.towhee.base import BaseMemory
from src.towhee.memory.sql import MemoryStore


class TestSql(unittest.TestCase):
Empty file.
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
import unittest
from unittest.mock import patch

import json
import sys
import os

from milvus import MilvusServer

sys.path.append(os.path.join(os.path.dirname(__file__), '../../../..'))

from config import ( # pylint: disable=C0413
CHAT_CONFIG, TEXTENCODER_CONFIG,
VECTORDB_CONFIG, RERANK_CONFIG,
)
from src_towhee.pipelines import TowheePipelines # pylint: disable=C0413
from src.towhee.pipelines import TowheePipelines # pylint: disable=C0413

milvus_server = MilvusServer()

Original file line number Diff line number Diff line change
@@ -3,11 +3,6 @@

from towhee.runtime.data_queue import DataQueue, ColumnType

import sys
import os

sys.path.append(os.path.join(os.path.dirname(__file__), '../../..'))


class MockStore:
def __init__(self, *args, **kwargs):
@@ -69,20 +64,20 @@ class TestOperations(unittest.TestCase):

def test_chat(self):

with patch('src_towhee.pipelines.TowheePipelines') as mock_pipelines, \
patch('src_towhee.memory.MemoryStore') as mock_memory:
with patch('src.towhee.pipelines.TowheePipelines') as mock_pipelines, \
patch('src.towhee.memory.MemoryStore') as mock_memory:
mock_pipelines.return_value = MockPipeline()
mock_memory.return_value = MockStore()

from src_towhee.pipelines import TowheePipelines
from src_towhee.memory import MemoryStore
from src.towhee.pipelines import TowheePipelines
from src.towhee.memory import MemoryStore

with patch.object(TowheePipelines, 'search_pipeline', mock_pipelines.search_pipeline), \
patch.object(MemoryStore, 'add_history', mock_memory.add_history), \
patch.object(MemoryStore, 'get_history', mock_memory.get_history), \
patch.object(MemoryStore, 'drop', mock_memory.drop):

from src_towhee.operations import chat, get_history, clear_history
from src.towhee.operations import chat, get_history, clear_history

question, answer = chat(
self.session_id, self.project, self.question)
@@ -97,13 +92,13 @@ def test_chat(self):

def test_insert(self):

with patch('src_towhee.pipelines.TowheePipelines') as mock_pipelines, \
patch('src_towhee.memory.MemoryStore') as mock_memory:
with patch('src.towhee.pipelines.TowheePipelines') as mock_pipelines, \
patch('src.towhee.memory.MemoryStore') as mock_memory:
mock_pipelines.return_value = MockPipeline()
mock_memory.return_value = MockStore()

from src_towhee.pipelines import TowheePipelines
from src_towhee.memory import MemoryStore
from src.towhee.pipelines import TowheePipelines
from src.towhee.memory import MemoryStore

with patch.object(TowheePipelines, 'insert_pipeline', mock_pipelines.insert_pipeline), \
patch.object(TowheePipelines, 'count_entities', mock_pipelines.count_entities), \
@@ -112,7 +107,7 @@ def test_insert(self):
patch.object(MemoryStore, 'check', mock_memory.check), \
patch.object(MemoryStore, 'drop', mock_memory.drop):

from src_towhee.operations import insert, check, drop
from src.towhee.operations import insert, check, drop

chunk_count, token_count = insert(self.test_src, self.project)
assert chunk_count == self.expect_len, token_count == self.expect_token_count