Skip to content

Commit

Permalink
Add example using a configurable LangChain document loader (#543)
Browse files Browse the repository at this point in the history
  • Loading branch information
cbornet authored Oct 8, 2023
1 parent 0534e11 commit 08b0012
Show file tree
Hide file tree
Showing 13 changed files with 1,394 additions and 1 deletion.
149 changes: 149 additions & 0 deletions examples/applications/langchain-document-loader/.langstreamignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# .langstreamignore file inspired by https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# These folders hold the libs built for the target
# and we need them in the package
!python/lib/
!java/lib/

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
Pipfile.lock

# poetry
poetry.lock

# pdm
pdm.lock
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
.idea/
21 changes: 21 additions & 0 deletions examples/applications/langchain-document-loader/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# LangChain Source

This sample application shows how to use LangChain in a Python agent.
The application is a Source that uses a configurable LangChain to load documents and then uses a configurable LangChain text splitter to chunk the document and send the chunks to a Kafka topic.

## Configure the pipeline

Update the pipeline file and set the loader and splitter class names and their parameters.
Set the load-interval-seconds to the number of seconds between each load of documents.
The load-interval-seconds value -1 (default) means load once.

## Deploy the LangStream application

```
./bin/langstream apps deploy test -app examples/applications/langchain-document-loader -i examples/instances/kafka-kubernetes.yaml
```

## Consume from the Gateway Consumer
```
./bin/langstream gateway consume test consume-output
```
21 changes: 21 additions & 0 deletions examples/applications/langchain-document-loader/gateways.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#
#
# Copyright DataStax, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

gateways:
- id: consume-output
type: consume
topic: output-topic
31 changes: 31 additions & 0 deletions examples/applications/langchain-document-loader/pipeline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#
# Copyright DataStax, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

name: "LangChain document loader"
topics:
- name: "output-topic"
creation-mode: create-if-not-exists
pipeline:
- name: "Load documents and chunk them with LangChain"
type: "python-source"
output: "output-topic"
configuration:
className: langchain_document_loader.LangChainDocumentLoaderSource
load-interval-seconds: 3600
loader-class: WebBaseLoader
loader-args:
web-path: ["https://langstream.ai/"]
splitter-class: RecursiveCharacterTextSplitter
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[dev-packages]
pyYaml = "*"
langstream-ai = "*"
langchain = "*"
pytest = "*"
tox = "*"
Loading

0 comments on commit 08b0012

Please sign in to comment.