-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #226 from unoplat/225-docs-update-setup-run-and-ex…
…ample-instructions-in-documentation-website 225 docs update setup run and example instructions in documentation website
- Loading branch information
Showing
9 changed files
with
759 additions
and
616 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
{ | ||
"label": "Code Confluence Roadmap", | ||
"label": "Code Confluence Introduction", | ||
"position": 3, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "This will cover the roadmap for Unoplat Code Confluence and deep dive into the architecture" | ||
"description": "This will cover the vision,roadmap and how-it-works for Unoplat Code Confluence." | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
sidebar_position: 1 | ||
--- | ||
|
||
# Vision: The Universal Code Context Engine | ||
|
||
## 🎯 Our Mission | ||
|
||
Unoplat Code Confluence aims to be the definitive solution for extracting, understanding, and providing precise code context across repositories and domains. We believe that combining deterministic code grammar with state-of-the-art LLM pipelines can achieve human-like understanding of codebases in minutes rather than months. | ||
|
||
## 🌟 Why Unoplat Code Confluence? | ||
|
||
### Core Principles | ||
|
||
1. **Precision First** | ||
- Built for deterministic and accurate code context extraction | ||
- Leverages Antlr and tree-sitter grammars for reliable parsing | ||
- Ensures accuracy in code relationship mapping | ||
|
||
2. **AI-Powered Understanding** | ||
- Advanced LLM pipelines that comprehend code relationships | ||
- Semantic understanding similar to human developers | ||
- Based on in-house graph structures and parsing algorithms | ||
|
||
3. **Graph Intelligence** | ||
- Uses graph databases for both ingestion and querying | ||
- Enables deep contextual understanding | ||
- Preserves complex relationships between code elements | ||
|
||
4. **Enterprise-Grade Reliability** | ||
- Powered by workflow orchestration | ||
- Scalable and reliable processing | ||
- Production-ready architecture | ||
|
||
## 🔍 The OSS Atlas Initiative | ||
|
||
Our OSS Atlas project is designed to dramatically accelerate contributor onboarding and productivity in open-source projects. We aim to: | ||
|
||
### For Contributors | ||
- **Accelerate Onboarding**: Understand complex codebases in minutes instead of months | ||
- **Boost Contribution Velocity**: Make meaningful contributions faster with deep contextual insights | ||
- **Navigate Complex Systems**: Easily understand dependencies, patterns, and architectural decisions | ||
- **Learn Best Practices**: Study and adopt patterns from well-established open-source projects | ||
|
||
### For Integration Partners | ||
|
||
Unoplat Code Confluence provides: | ||
- High-precision code context API powered by graph-based retrieval | ||
- Cross-repository semantic understanding through LLM pipelines | ||
- Reduced operational complexity for context extraction | ||
- Ready-to-use integration with popular tools like OpenDevin, Devon, Danswer, and Continue Dev | ||
|
||
## 🚀 Future Direction | ||
|
||
We are committed to: | ||
1. Expanding language support beyond Python | ||
2. Enhancing our LLM pipelines for even better code understanding | ||
3. Building more integration points with popular development tools | ||
4. Growing our OSS Atlas initiative to support more open-source projects | ||
5. Developing advanced visualization and analysis capabilities | ||
|
||
Our vision is to make codebases more accessible, understandable, and maintainable for developers worldwide, whether they're working on small projects or enterprise-scale systems. | ||
|
||
> Ready to get started? Check out our [Quick Start Guide](../quickstart/how-to-run) to begin your journey with Unoplat Code Confluence. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,173 +4,193 @@ sidebar_position: 2 | |
|
||
# Quick Start Guide | ||
|
||
Welcome to **Unoplat Code Confluence**! This guide will help you quickly set up and start using our platform to enhance your codebase management and collaboration. | ||
Welcome to **Unoplat Code Confluence** | ||
|
||
## Table of Contents | ||
|
||
1. [Introduction](#introduction) | ||
2. [Prerequisites](#prerequisites) | ||
3. [1. Graph Database Setup](#1-graph-database-setup) | ||
- [Installation](#installation) | ||
4. [2. Generate Summary and Ingest Codebase](#2-generate-summary-and-ingest-codebase) | ||
- [Ingestion Configuration](#ingestion-configuration) | ||
- [Run the Unoplat Code Confluence Ingestion Utility](#run-the-unoplat-code-confluence-ingestion-utility) | ||
5. [3. Setup Chat Interface](#3-setup-chat-interface) | ||
- [Query Engine Configuration](#query-engine-configuration) | ||
- [Launch Query Engine](#launch-query-engine) | ||
6. [Troubleshooting](#troubleshooting) | ||
3. [Installation](#installation) | ||
4. [Troubleshooting](#troubleshooting) | ||
|
||
## Introduction | ||
|
||
**Unoplat Code Confluence** empowers developers to effortlessly navigate and understand complex codebases. By leveraging a graph database and an intuitive chat interface, our platform enhances collaboration and accelerates onboarding. | ||
**Unoplat Code Confluence** currently supports python codebases. It is currently in alpha stage and we are working on adding support for more codebases and features. The current version supports parsing codebases and exporting a json representation of code graph. For more details on upcoming features, vision, deep dive please check out [vision](/docs/deep-dive/vision), [roadmap](/docs/deep-dive/roadmap) and [How-It-Works](/docs/deep-dive/how-it-works) respectively. | ||
|
||
## Prerequisites | ||
|
||
Before you begin, ensure you have the following installed on your system: | ||
### Codebase Requirements | ||
|
||
- [Docker](https://www.docker.com/get-started) | ||
- [Pipx](https://github.com/pypa/pipx) | ||
- [Poetry](https://python-poetry.org/) | ||
Currently unoplat code confluence supports python codebases till 3.11 (due to dependency on isort). To support features like segregating imports and figuring out internal dependencies code confluence relies on [ruff](https://docs.astral.sh/ruff/) and [isort](https://pycqa.github.io/isort/) ecosystem. | ||
|
||
```bash | ||
pipx install poetry | ||
``` | ||
Here are the configurations that are required to be set in codebase: | ||
|
||
#### 1. ruff.toml | ||
|
||
```toml | ||
target-version = "py311" | ||
|
||
## 1. Graph Database Setup | ||
exclude = [ | ||
".git", | ||
".mypy_cache", | ||
".pytest_cache", | ||
".ruff_cache", | ||
".venv", | ||
"venv", | ||
"build", | ||
"dist", | ||
] | ||
|
||
### Installation | ||
src = ["unoplat_code_confluence"] # Adjust this to your project's source directory | ||
line-length = 320 | ||
|
||
1. **Run the Neo4j Container** | ||
[lint] | ||
# Enable only flake8-tidy-imports | ||
select = ["I","E402","INP001","TID","F401","F841"] | ||
|
||
[lint.per-file-ignores] | ||
"__init__.py" = ["E402","F401"] | ||
|
||
[lint.flake8-tidy-imports] | ||
ban-relative-imports = "all" | ||
|
||
[lint.isort] | ||
combine-as-imports = true | ||
force-to-top = ["os","sys"] | ||
``` | ||
|
||
Then run ruff on cli with: | ||
|
||
```bash | ||
docker run \ | ||
--name neo4j-container \ | ||
--restart always \ | ||
--publish 7474:7474 \ | ||
--publish 7687:7687 \ | ||
--env NEO4J_AUTH=neo4j/Ke7Rk7jB:Jn2Uz: \ | ||
--volume /Users/jayghiya/Documents/unoplat/neo4j-data:/data \ | ||
--volume /Users/jayghiya/Documents/unoplat/neo4j-plugins/:/plugins \ | ||
neo4j:5.23.0 | ||
ruff check --fix . --unsafe-fixes | ||
``` | ||
|
||
## 2. Generate Summary and Ingest Codebase | ||
#### 2. Isort Configuration (.isort.cfg) | ||
|
||
```ini | ||
[settings] | ||
known_third_party = "Include third party dependencies here" | ||
import_heading_stdlib = Standard Library | ||
import_heading_thirdparty = Third Party | ||
import_heading_firstparty = First Party | ||
import_heading_localfolder = Local | ||
py_version = 311 # For Python 3.12 | ||
line_length = 500 | ||
``` | ||
|
||
### Ingestion Configuration | ||
Then run isort on cli with: | ||
|
||
```json | ||
{ | ||
"local_workspace_path": "/Users/jayghiya/Documents/unoplat/textgrad/textgrad", | ||
"output_path": "/Users/jayghiya/Documents/unoplat", | ||
"output_file_name": "unoplat_textgrad.md", | ||
"codebase_name": "textgrad", | ||
"programming_language": "python", | ||
"repo": { | ||
"download_url": "archguard/archguard", | ||
"download_directory": "/Users/jayghiya/Documents/unoplat" | ||
}, | ||
"api_tokens": { | ||
"github_token": "Your github pat token" | ||
}, | ||
"llm_provider_config": { | ||
"openai": { | ||
"api_key": "Your openai api key", | ||
"model": "gpt-4o-mini", | ||
"model_type": "chat", | ||
"max_tokens": 512, | ||
"temperature": 0.0 | ||
} | ||
}, | ||
"logging_handlers": [ | ||
{ | ||
"sink": "~/Documents/unoplat/app.log", | ||
"format": "<green>{time:YYYY-MM-DD at HH:mm:ss}</green> | <level>{level}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | <magenta>{thread.name}</magenta> - <level>{message}</level>", | ||
"rotation": "10 MB", | ||
"retention": "10 days", | ||
"level": "DEBUG" | ||
} | ||
], | ||
"parallisation": 3, | ||
"sentence_transformer_model": "jinaai/jina-embeddings-v3", | ||
"neo4j_uri": "bolt://localhost:7687", | ||
"neo4j_username": "neo4j", | ||
"neo4j_password": "Ke7Rk7jB:Jn2Uz:" | ||
} | ||
```bash | ||
isort . --python-version 311 | ||
``` | ||
|
||
> **Note**: As of now for `sentence_transformer_model`, only Hugging Face sentence embedding models with dimensions up to 4096 are supported. Dimensions' upper limit is due to Neo4j vector index limitations. Make sure your chosen model meets these requirements. | ||
### Installation Requirements | ||
|
||
### Run the Unoplat Code Confluence Ingestion Utility | ||
Before you begin, ensure you have the following installed on your system: | ||
|
||
1. **Installation** | ||
- [PyEnv](https://github.com/pyenv/pyenv) | ||
- [Pipx](https://github.com/pypa/pipx) | ||
- [Poetry](https://python-poetry.org/) | ||
|
||
```bash | ||
pipx install 'git+https://github.com/unoplat/[email protected]#subdirectory=unoplat-code-confluence' | ||
pipx install poetry | ||
``` | ||
|
||
2. **Run the Ingestion Utility** | ||
## Installation | ||
|
||
### 1. Python Setup | ||
|
||
```bash | ||
unoplat-code-confluence --config /path/to/your/config.json | ||
pyenv install 3.12.1 | ||
pyenv global 3.12.1 | ||
``` | ||
|
||
### 2. Install Unoplat Code Confluence | ||
|
||
3. **Example Run** | ||
```bash | ||
pipx install --python $(pyenv which python) 'git+https://github.com/unoplat/[email protected]#subdirectory=unoplat-code-confluence' | ||
``` | ||
|
||
## Configuration | ||
|
||
<img src={require('../../static/img/code-confluence-parsing-ingestion.png').default} alt="Unoplat Code Confluence Output" className="zoomable" /> | ||
### JSON Configuration | ||
|
||
After running the ingestion utility, you'll find the generated markdown file in the specified output directory. The file will contain a comprehensive summary of your codebase. Also the summary and other relevant metadata would be stored in the graph database. | ||
#### Configuration Fields | ||
|
||
Also check out the Neo4j Browser to visualize the graph database. Go to [http://localhost:7474/browser/](http://localhost:7474/browser/) | ||
1. **repositories** (Required): Array of repositories to analyze | ||
- `git_url`: URL of the Git repository | ||
- `output_path`: Local directory where analysis results will be stored | ||
- `codebases`: Array of codebases within the repository | ||
- `codebase_folder_name`: Name of the folder containing the codebase | ||
- `root_package_name`: Root package name (optional for some languages) | ||
- `programming_language_metadata`: Language-specific configuration | ||
- `language`: Programming language (currently supports "python") | ||
- `package_manager`: Package manager type ("poetry" or "pip") | ||
- `language_version`: Version of the programming language | ||
|
||
<img src={require('../../static/img/code-confluence-neo4j-browser.png').default} alt="Unoplat Code Confluence Graph Database" className="zoomable" /> | ||
2. **archguard** (Required): Configuration for ArchGuard tool | ||
- `download_url`: URL to download ArchGuard from | ||
- `download_directory`: Local directory to store ArchGuard | ||
|
||
## 3. Setup Chat Interface | ||
3. **logging_handlers** (Required): Array of logging configurations | ||
- `sink`: Log file path | ||
- `format`: Log message format | ||
- `rotation`: Log file rotation size | ||
- `retention`: Log retention period | ||
- `level`: Logging level | ||
|
||
### Query Engine Configuration | ||
#### Example Configuration | ||
|
||
```json | ||
{ | ||
"sentence_transformer_model": "jinaai/jina-embeddings-v3", | ||
"neo4j_uri": "bolt://localhost:7687", | ||
"neo4j_username": "neo4j", | ||
"neo4j_password": "your neo4j password", | ||
"provider_model_dict": { | ||
"model_provider": "openai/gpt-4o-mini", | ||
"model_provider_args": { | ||
"api_key": "your openai api key", | ||
"max_tokens": 500, | ||
"temperature": 0.0 | ||
"repositories": [ | ||
{ | ||
"git_url": "https://github.com/unoplat/unoplat-code-confluence", | ||
"output_path": "/Users/jayghiya/Documents/unoplat", | ||
"codebases": [ | ||
{ | ||
"codebase_folder_name": "unoplat-code-confluence", | ||
"root_package_name": "unoplat_code_confluence", | ||
"programming_language_metadata": { | ||
"language": "python", | ||
"package_manager": "poetry", | ||
"language_version": "3.12.0" | ||
} | ||
} | ||
] | ||
} | ||
} | ||
], | ||
"archguard": { | ||
"download_url": "archguard/archguard", | ||
"download_directory": "/Users/jayghiya/Documents/unoplat" | ||
}, | ||
"logging_handlers": [ | ||
{ | ||
"sink": "~/Documents/unoplat/app.log", | ||
"format": "<green>{time:YYYY-MM-DD at HH:mm:ss}</green> | <level>{level}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | <magenta>{thread.name}</magenta> - <level>{message}</level>", | ||
"rotation": "10 MB", | ||
"retention": "10 days", | ||
"level": "DEBUG" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
> **Note**: As of now for `sentence_transformer_model`, only Hugging Face sentence embedding models with dimensions up to 4096 are supported. Dimensions' upper limit is due to Neo4j vector index limitations. Make sure your chosen model meets these requirements. | ||
### Launch Query Engine | ||
### Environment Variables | ||
|
||
1. **Installation** | ||
Create a `.env.dev` file where you intend to run the project: | ||
|
||
```bash | ||
pipx install 'git+https://github.com/unoplat/[email protected]#subdirectory=unoplat-code-confluence-query-engine' | ||
```env | ||
UNOPLAT_ENV=dev | ||
UNOPLAT_DEBUG=true | ||
UNOPLAT_GITHUB_TOKEN=Your_Github_Pat_Token | ||
``` | ||
|
||
2. **Run the Query Engine** | ||
### Running the Application | ||
|
||
```bash | ||
unoplat-code-confluence-query-engine --config /path/to/your/config.json | ||
unoplat-code-confluence --config /path/to/your/config.json | ||
``` | ||
|
||
3. **Example Run** | ||
|
||
|
||
<img src={require('../../static/img/code-confluence-query-engine.png').default} alt="Unoplat Code Confluence Query Engine" className="zoomable" /> | ||
|
||
We had added [textgrad](https://github.com/zou-group/textgrad) to our graph database in the configuration of ingestion utility. You can now chat with the codebase. To view existing codebases press ctrl + e. | ||
|
||
<img src={require('../../static/img/code-confluence-existing-codebases.png').default} alt="Unoplat Code Confluence Existing Codebases" className="zoomable" /> | ||
|
||
## Troubleshooting | ||
|
Oops, something went wrong.