Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
zainhoda authored Jan 17, 2024
2 parents afa96b2 + 0684b5a commit 19994e0
Show file tree
Hide file tree
Showing 126 changed files with 1,321 additions and 395 deletions.
33 changes: 0 additions & 33 deletions .github/workflows/ci.yml

This file was deleted.

14 changes: 13 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
## Contributing
# Contributing

## Setup
```bash
git clone https://github.com/vanna-ai/vanna.git
cd vanna/
Expand All @@ -18,3 +20,13 @@ tox list
# Run tests
tox -e py310
```

## Do this before you submit a PR:

Find the most relevant sample notebook and then replace the install command with:

```bash
%pip install 'git+https://github.com/vanna-ai/vanna@your-branch#egg=vanna[chromadb,snowflake,openai]'
```

Run the necessary cells and verify that it works as expected in a real-world scenario.
128 changes: 81 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,65 @@
![](https://img.vanna.ai/vanna-github.svg)


| GitHub | PyPI | Documentation |
| ------ | ---- | ------------- |
| [![GitHub](https://img.shields.io/badge/GitHub-vanna-blue?logo=github)](https://github.com/vanna-ai/vanna) | [![PyPI](https://img.shields.io/pypi/v/vanna?logo=pypi)](https://pypi.org/project/vanna/) | [![Documentation](https://img.shields.io/badge/Documentation-vanna-blue?logo=read-the-docs)](https://vanna.ai/docs/) |

# Vanna.AI - Personalized AI SQL Agent
# Vanna
Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality.

https://github.com/vanna-ai/vanna/assets/7146154/1901f47a-515d-4982-af50-f12761a3b2ce

![vanna-quadrants](https://github.com/vanna-ai/vanna/assets/7146154/1c7c88ba-c144-4ecf-a028-cf5ba7344ca2)

## How Vanna works
Vanna works in two easy steps - train a model on your data, and then ask questions.
Vanna works in two easy steps - train a RAG "model" on your data, and then ask questions which will return SQL queries that can be set up to automatically run on your database.

1. **Train a model on your data**.
1. **Train a RAG "model" on your data**.
2. **Ask questions**.

When you ask a question, we utilize a custom model for your dataset to generate SQL, as seen below. Your model performance and accuracy depends on the quality and quantity of training data you use to train your model.
<img width="1725" alt="how-vanna-works" src="https://github.com/vanna-ai/vanna/assets/7146154/5e2e2179-ed7a-4df4-92a2-1c017923a675">
![](img/vanna-readme-diagram.png)

If you don't know what RAG is, don't worry -- you don't need to know how this works under the hood to use it. You just need to know that you "train" a model, which stores some metadata and then use it to "ask" questions.

See the [base class](src/vanna/base/base.py) for more details on how this works under the hood.

## User Interfaces
These are some of the user interfaces that we've built using Vanna. You can use these as-is or as a starting point for your own custom interface.

- [Jupyter Notebook](https://github.com/vanna-ai/vanna/blob/main/notebooks/getting-started.ipynb)
- [vanna-ai/vanna-streamlit](https://github.com/vanna-ai/vanna-streamlit)
- [vanna-ai/vanna-flask](https://github.com/vanna-ai/vanna-flask)
- [vanna-ai/vanna-slack](https://github.com/vanna-ai/vanna-slack)


## Getting started
You can start by [automatically training Vanna (currently works for Snowflake)](https://vanna.ai/docs/vn-train.html) or add manual training data.
See the [documentation](https://vanna.ai/docs/) for specifics on your desired database, LLM, etc.

### Install Vanna
```
If you want to get a feel for how it works after training, you can try this [Colab notebook](https://colab.research.google.com/github/vanna-ai/vanna/blob/main/notebooks/getting-started.ipynb).


### Install
```bash
pip install vanna
```

Depending on the database you're using, you can also install the associated database drivers
```
pip install 'vanna[snowflake]'
```
There are a number of optional packages that can be installed so see the [documentation](https://vanna.ai/docs/) for more details.

### Import
See the [documentation](https://vanna.ai/docs/) if you're customizing the LLM or vector database.

### Import Vanna
```python
import vanna as vn
```

### Train with DDL Statements
If you prefer to manually train, you do not need to connect to a database. You can use the train function with other parmaeters like ddl

## Training
You may or may not need to run these `vn.train` commands depending on your use case. See the [documentation](https://vanna.ai/docs/) for more details.

These statements are shown to give you a feel for how it works.

### Train with DDL Statements
DDL statements contain information about the table names, columns, data types, and relationships in your database.

```python
vn.train(ddl="""
Expand All @@ -53,14 +75,14 @@ vn.train(ddl="""
Sometimes you may want to add documentation about your business terminology or definitions.

```python
vn.train(documentation="Our business defines OTIF score as the percentage of orders that are delivered on time and in full")
vn.train(documentation="Our business defines XYZ as ...")
```

### Train with SQL
You can also add SQL queries to your training data. This is useful if you have some queries already laying around. You can just copy and paste those from your editor to begin generating new SQL.

```python
vn.train(sql="SELECT * FROM my-table WHERE name = 'John Doe'")
vn.train(sql="SELECT name, age FROM my-table WHERE name = 'John Doe'")
```


Expand All @@ -69,16 +91,18 @@ vn.train(sql="SELECT * FROM my-table WHERE name = 'John Doe'")
vn.ask("What are the top 10 customers by sales?")
```

SELECT c.c_name as customer_name,
sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales
FROM snowflake_sample_data.tpch_sf1.lineitem l join snowflake_sample_data.tpch_sf1.orders o
ON l.l_orderkey = o.o_orderkey join snowflake_sample_data.tpch_sf1.customer c
ON o.o_custkey = c.c_custkey
GROUP BY customer_name
ORDER BY total_sales desc limit 10;


You'll get SQL
```sql
SELECT c.c_name as customer_name,
sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales
FROM snowflake_sample_data.tpch_sf1.lineitem l join snowflake_sample_data.tpch_sf1.orders o
ON l.l_orderkey = o.o_orderkey join snowflake_sample_data.tpch_sf1.customer c
ON o.o_custkey = c.c_custkey
GROUP BY customer_name
ORDER BY total_sales desc limit 10;
```

If you've connected to a database, you'll get the table:
<div>
<table border="1" class="dataframe">
<thead>
Expand Down Expand Up @@ -143,33 +167,43 @@ vn.ask("What are the top 10 customers by sales?")
</table>
</div>

You'll also get an automated Plotly chart:
![](img/top-10-customers.png)

## RAG vs. Fine-Tuning
RAG
- Portable across LLMs
- Easy to remove training data if any of it becomes obsolete
- Much cheaper to run than fine-tuning
- More future-proof -- if a better LLM comes out, you can just swap it out

Fine-Tuning
- Good if you need to minimize tokens in the prompt
- Slow to get started
- Expensive to train and run (generally)

## Why Vanna?

1. **High accuracy on complex datasets.**
- Vanna’s capabilities are tied to the training data you give it
- More training data means better accuracy for large and complex datasets
2. **Secure and private.**
- Your database contents are never sent to Vanna’s servers
- We only see the bare minimum - schemas & queries.
3. **Isolated, custom model.**
- You train a custom model specific to your database and your schema.
- Nobody else can use your model or view your model’s training data unless you choose to add members to your model or make it public
- We use a combination of third-party foundational models (OpenAI, Google) and our own LLM.
4. **Self learning.**
- As you use Vanna more, your model continuously improves as we augment your training data
5. **Supports many databases.**
- We have out-of-the-box support Snowflake, BigQuery, Postgres
- You can easily make a connector for any [database](https://docs.vanna.ai/databases/)
6. **Pretrained models.**
- If you’re a data provider you can publish your models for anyone to use
- As part of our roadmap, we are in the process of pre-training models for common datasets (Google Ads, Facebook ads, etc)
7. **Choose your front end.**
- Start in a Jupyter Notebook.
- Expose to business users via Slackbot, web app, Streamlit app, or Excel plugin.
- Even integrate in your web app for customers.
- Your database contents are never sent to the LLM or the vector database
- SQL execution happens in your local environment
3. **Self learning.**
- If using via Jupyter, you can choose to "auto-train" it on the queries that were successfully executed
- If using via other interfaces, you can have the interface prompt the user to provide feedback on the results
- Correct question to SQL pairs are stored for future reference and make the future results more accurate
4. **Supports any SQL database.**
- The package allows you to connect to any SQL database that you can otherwise connect to with Python
5. **Choose your front end.**
- Most people start in a Jupyter Notebook.
- Expose to your end users via Slackbot, web app, Streamlit app, or a custom front end.

## Extending Vanna
Vanna is designed to connect to any database, LLM, and vector database. There's a [VannaBase](src/vanna/base/base.py) abstract base class that defines some basic functionality. The package provides implementations for use with OpenAI and ChromaDB. You can easily extend Vanna to use your own LLM or vector database. See the [documentation](https://vanna.ai/docs/) for more details.

## More resources
- [Full Documentation](https://vanna.ai/docs/)
- [Website](https://vanna.ai)
- [Slack channel for support](https://join.slack.com/t/vanna-ai/shared_invite/zt-1unu0ipog-iE33QCoimQiBDxf2o7h97w)
- [LinkedIn](https://www.linkedin.com/company/vanna-ai/)
- [Discord group for support](https://discord.gg/qUZYKHremx)
2 changes: 1 addition & 1 deletion docs/sidebar.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
</svg>
- title: Running Locally
link: local.html
link: snowflake-openai-standard-chromadb.html
svg_text: |-
<svg class="w-6 h-6 text-gray-800 dark:text-white" aria-hidden="true" fill="none" stroke="currentColor" stroke-width="1.5" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg" aria-hidden="true">
<path stroke-linecap="round" stroke-linejoin="round" d="M9 17.25v1.007a3 3 0 01-.879 2.122L7.5 21h9l-.621-.621A3 3 0 0115 18.257V17.25m6-12V15a2.25 2.25 0 01-2.25 2.25H5.25A2.25 2.25 0 013 15V5.25m18 0A2.25 2.25 0 0018.75 3H5.25A2.25 2.25 0 003 5.25m18 0V12a2.25 2.25 0 01-2.25 2.25H5.25A2.25 2.25 0 013 12V5.25"></path>
Expand Down
Binary file added img/top-10-customers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/vanna-readme-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions nb-theme/index.html.j2
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,11 @@ a.anchor-link {
<nav class="dark bg-white border-gray-200 py-2.5 dark:bg-gray-900">
<div class="flex flex-wrap items-center justify-between max-w-screen-xl px-4 mx-auto">
<a href="/" class="flex items-center">
<img src="https://ask.vanna.ai/static/img/vanna.svg" class="h-6 mr-3 sm:h-9" alt="Vanna Logo">
<img src="/img/vanna.svg" class="h-6 mr-3 sm:h-9" alt="Vanna Logo">
<span class="self-center text-4xl font-semibold whitespace-nowrap dark:text-white nav-title">Vanna.AI</span>
</a>
<div class="flex items-center lg:order-2">
<a href="https://models.vanna.ai" class="inline-flex items-center justify-center px-5 py-3 text-base font-medium text-center text-white rounded-lg bg-purple-700 hover:bg-purple-800 focus:ring-4 focus:ring-purple-300 dark:focus:ring-purple-900">
<a href="/models/" class="inline-flex items-center justify-center px-5 py-3 text-base font-medium text-center text-white rounded-lg bg-purple-700 hover:bg-purple-800 focus:ring-4 focus:ring-purple-300 dark:focus:ring-purple-900">
Models
</a>
<button data-collapse-toggle="mobile-menu-2" type="button" class="inline-flex items-center p-2 ml-1 text-sm text-gray-500 rounded-lg lg:hidden hover:bg-gray-100 focus:outline-none focus:ring-2 focus:ring-gray-200 dark:text-gray-400 dark:hover:bg-gray-700 dark:focus:ring-gray-600" aria-controls="mobile-menu-2" aria-expanded="false">
Expand All @@ -146,10 +146,10 @@ a.anchor-link {
<a href="https://vanna.ai/#pricing" class="block py-2 pl-3 pr-4 text-gray-700 border-b border-gray-100 hover:bg-gray-50 lg:hover:bg-transparent lg:border-0 lg:hover:text-purple-700 lg:p-0 dark:text-gray-400 lg:dark:hover:text-white dark:hover:bg-gray-700 dark:hover:text-white lg:dark:hover:bg-transparent dark:border-gray-700">Pricing</a>
</li>
<li>
<a href="https://docs.vanna.ai" class="block py-2 pl-3 pr-4 text-white border-b border-gray-100 hover:bg-gray-50 lg:hover:bg-transparent lg:border-0 lg:hover:text-purple-700 lg:p-0 dark:text-white lg:dark:hover:text-white dark:hover:bg-gray-700 dark:hover:text-white lg:dark:hover:bg-transparent dark:border-gray-700">Docs</a>
<a href="/docs/" class="block py-2 pl-3 pr-4 text-white border-b border-gray-100 hover:bg-gray-50 lg:hover:bg-transparent lg:border-0 lg:hover:text-purple-700 lg:p-0 dark:text-white lg:dark:hover:text-white dark:hover:bg-gray-700 dark:hover:text-white lg:dark:hover:bg-transparent dark:border-gray-700">Docs</a>
</li>
<li>
<a href="https://models.vanna.ai" class="block py-2 pl-3 pr-4 text-gray-700 border-b border-gray-100 hover:bg-gray-50 lg:hover:bg-transparent lg:border-0 lg:hover:text-purple-700 lg:p-0 dark:text-gray-400 lg:dark:hover:text-white dark:hover:bg-gray-700 dark:hover:text-white lg:dark:hover:bg-transparent dark:border-gray-700">Models</a>
<a href="/models/" class="block py-2 pl-3 pr-4 text-gray-700 border-b border-gray-100 hover:bg-gray-50 lg:hover:bg-transparent lg:border-0 lg:hover:text-purple-700 lg:p-0 dark:text-gray-400 lg:dark:hover:text-white dark:hover:bg-gray-700 dark:hover:text-white lg:dark:hover:bg-transparent dark:border-gray-700">Models</a>
</li>
<li>
<a href="https://github.com/vanna-ai/vanna" target="_blank" class="block py-2 pl-3 pr-4 text-gray-700 border-b border-gray-100 hover:bg-gray-50 lg:hover:bg-transparent lg:border-0 lg:hover:text-purple-700 lg:p-0 dark:text-gray-400 lg:dark:hover:text-white dark:hover:bg-gray-700 dark:hover:text-white lg:dark:hover:bg-transparent dark:border-gray-700">GitHub</a>
Expand Down
Binary file added notebooks/Chinook.sqlite
Binary file not shown.
48 changes: 48 additions & 0 deletions notebooks/app.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install vanna\n",
"import vanna\n",
"from vanna.remote import VannaDefault\n",
"vn = VannaDefault(model='chinook', api_key=vanna.get_api_key('[email protected]'))\n",
"vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')\n",
"vn.ask(\"What are the top 10 albums by sales?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from vanna.flask import VannaFlaskApp\n",
"VannaFlaskApp(vn).run()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Here's what you'll get\n",
"![vanna-flask](https://vanna.ai/blog/img/vanna-flask.gif)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 19994e0

Please sign in to comment.