Skip to content

Commit

Permalink
Merge pull request #226 from unoplat/225-docs-update-setup-run-and-ex…
Browse files Browse the repository at this point in the history
…ample-instructions-in-documentation-website

225 docs update setup run and example instructions in documentation website
  • Loading branch information
JayGhiya authored Dec 21, 2024
2 parents 7daa92f + 43545ab commit 28db23e
Show file tree
Hide file tree
Showing 9 changed files with 759 additions and 616 deletions.
4 changes: 2 additions & 2 deletions code-confluence/docs/deep-dive/_category_.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"label": "Code Confluence Roadmap",
"label": "Code Confluence Introduction",
"position": 3,
"link": {
"type": "generated-index",
"description": "This will cover the roadmap for Unoplat Code Confluence and deep dive into the architecture"
"description": "This will cover the vision,roadmap and how-it-works for Unoplat Code Confluence."
}
}
64 changes: 64 additions & 0 deletions code-confluence/docs/deep-dive/vision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
sidebar_position: 1
---

# Vision: The Universal Code Context Engine

## 🎯 Our Mission

Unoplat Code Confluence aims to be the definitive solution for extracting, understanding, and providing precise code context across repositories and domains. We believe that combining deterministic code grammar with state-of-the-art LLM pipelines can achieve human-like understanding of codebases in minutes rather than months.

## 🌟 Why Unoplat Code Confluence?

### Core Principles

1. **Precision First**
- Built for deterministic and accurate code context extraction
- Leverages Antlr and tree-sitter grammars for reliable parsing
- Ensures accuracy in code relationship mapping

2. **AI-Powered Understanding**
- Advanced LLM pipelines that comprehend code relationships
- Semantic understanding similar to human developers
- Based on in-house graph structures and parsing algorithms

3. **Graph Intelligence**
- Uses graph databases for both ingestion and querying
- Enables deep contextual understanding
- Preserves complex relationships between code elements

4. **Enterprise-Grade Reliability**
- Powered by workflow orchestration
- Scalable and reliable processing
- Production-ready architecture

## 🔍 The OSS Atlas Initiative

Our OSS Atlas project is designed to dramatically accelerate contributor onboarding and productivity in open-source projects. We aim to:

### For Contributors
- **Accelerate Onboarding**: Understand complex codebases in minutes instead of months
- **Boost Contribution Velocity**: Make meaningful contributions faster with deep contextual insights
- **Navigate Complex Systems**: Easily understand dependencies, patterns, and architectural decisions
- **Learn Best Practices**: Study and adopt patterns from well-established open-source projects

### For Integration Partners

Unoplat Code Confluence provides:
- High-precision code context API powered by graph-based retrieval
- Cross-repository semantic understanding through LLM pipelines
- Reduced operational complexity for context extraction
- Ready-to-use integration with popular tools like OpenDevin, Devon, Danswer, and Continue Dev

## 🚀 Future Direction

We are committed to:
1. Expanding language support beyond Python
2. Enhancing our LLM pipelines for even better code understanding
3. Building more integration points with popular development tools
4. Growing our OSS Atlas initiative to support more open-source projects
5. Developing advanced visualization and analysis capabilities

Our vision is to make codebases more accessible, understandable, and maintainable for developers worldwide, whether they're working on small projects or enterprise-scale systems.

> Ready to get started? Check out our [Quick Start Guide](../quickstart/how-to-run) to begin your journey with Unoplat Code Confluence.
242 changes: 131 additions & 111 deletions code-confluence/docs/quickstart/how-to-run.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,173 +4,193 @@ sidebar_position: 2

# Quick Start Guide

Welcome to **Unoplat Code Confluence**! This guide will help you quickly set up and start using our platform to enhance your codebase management and collaboration.
Welcome to **Unoplat Code Confluence**

## Table of Contents

1. [Introduction](#introduction)
2. [Prerequisites](#prerequisites)
3. [1. Graph Database Setup](#1-graph-database-setup)
- [Installation](#installation)
4. [2. Generate Summary and Ingest Codebase](#2-generate-summary-and-ingest-codebase)
- [Ingestion Configuration](#ingestion-configuration)
- [Run the Unoplat Code Confluence Ingestion Utility](#run-the-unoplat-code-confluence-ingestion-utility)
5. [3. Setup Chat Interface](#3-setup-chat-interface)
- [Query Engine Configuration](#query-engine-configuration)
- [Launch Query Engine](#launch-query-engine)
6. [Troubleshooting](#troubleshooting)
3. [Installation](#installation)
4. [Troubleshooting](#troubleshooting)

## Introduction

**Unoplat Code Confluence** empowers developers to effortlessly navigate and understand complex codebases. By leveraging a graph database and an intuitive chat interface, our platform enhances collaboration and accelerates onboarding.
**Unoplat Code Confluence** currently supports python codebases. It is currently in alpha stage and we are working on adding support for more codebases and features. The current version supports parsing codebases and exporting a json representation of code graph. For more details on upcoming features, vision, deep dive please check out [vision](/docs/deep-dive/vision), [roadmap](/docs/deep-dive/roadmap) and [How-It-Works](/docs/deep-dive/how-it-works) respectively.

## Prerequisites

Before you begin, ensure you have the following installed on your system:
### Codebase Requirements

- [Docker](https://www.docker.com/get-started)
- [Pipx](https://github.com/pypa/pipx)
- [Poetry](https://python-poetry.org/)
Currently unoplat code confluence supports python codebases till 3.11 (due to dependency on isort). To support features like segregating imports and figuring out internal dependencies code confluence relies on [ruff](https://docs.astral.sh/ruff/) and [isort](https://pycqa.github.io/isort/) ecosystem.

```bash
pipx install poetry
```
Here are the configurations that are required to be set in codebase:

#### 1. ruff.toml

```toml
target-version = "py311"

## 1. Graph Database Setup
exclude = [
".git",
".mypy_cache",
".pytest_cache",
".ruff_cache",
".venv",
"venv",
"build",
"dist",
]

### Installation
src = ["unoplat_code_confluence"] # Adjust this to your project's source directory
line-length = 320

1. **Run the Neo4j Container**
[lint]
# Enable only flake8-tidy-imports
select = ["I","E402","INP001","TID","F401","F841"]

[lint.per-file-ignores]
"__init__.py" = ["E402","F401"]

[lint.flake8-tidy-imports]
ban-relative-imports = "all"

[lint.isort]
combine-as-imports = true
force-to-top = ["os","sys"]
```

Then run ruff on cli with:

```bash
docker run \
--name neo4j-container \
--restart always \
--publish 7474:7474 \
--publish 7687:7687 \
--env NEO4J_AUTH=neo4j/Ke7Rk7jB:Jn2Uz: \
--volume /Users/jayghiya/Documents/unoplat/neo4j-data:/data \
--volume /Users/jayghiya/Documents/unoplat/neo4j-plugins/:/plugins \
neo4j:5.23.0
ruff check --fix . --unsafe-fixes
```

## 2. Generate Summary and Ingest Codebase
#### 2. Isort Configuration (.isort.cfg)

```ini
[settings]
known_third_party = "Include third party dependencies here"
import_heading_stdlib = Standard Library
import_heading_thirdparty = Third Party
import_heading_firstparty = First Party
import_heading_localfolder = Local
py_version = 311 # For Python 3.12
line_length = 500
```

### Ingestion Configuration
Then run isort on cli with:

```json
{
"local_workspace_path": "/Users/jayghiya/Documents/unoplat/textgrad/textgrad",
"output_path": "/Users/jayghiya/Documents/unoplat",
"output_file_name": "unoplat_textgrad.md",
"codebase_name": "textgrad",
"programming_language": "python",
"repo": {
"download_url": "archguard/archguard",
"download_directory": "/Users/jayghiya/Documents/unoplat"
},
"api_tokens": {
"github_token": "Your github pat token"
},
"llm_provider_config": {
"openai": {
"api_key": "Your openai api key",
"model": "gpt-4o-mini",
"model_type": "chat",
"max_tokens": 512,
"temperature": 0.0
}
},
"logging_handlers": [
{
"sink": "~/Documents/unoplat/app.log",
"format": "<green>{time:YYYY-MM-DD at HH:mm:ss}</green> | <level>{level}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | <magenta>{thread.name}</magenta> - <level>{message}</level>",
"rotation": "10 MB",
"retention": "10 days",
"level": "DEBUG"
}
],
"parallisation": 3,
"sentence_transformer_model": "jinaai/jina-embeddings-v3",
"neo4j_uri": "bolt://localhost:7687",
"neo4j_username": "neo4j",
"neo4j_password": "Ke7Rk7jB:Jn2Uz:"
}
```bash
isort . --python-version 311
```

> **Note**: As of now for `sentence_transformer_model`, only Hugging Face sentence embedding models with dimensions up to 4096 are supported. Dimensions' upper limit is due to Neo4j vector index limitations. Make sure your chosen model meets these requirements.
### Installation Requirements

### Run the Unoplat Code Confluence Ingestion Utility
Before you begin, ensure you have the following installed on your system:

1. **Installation**
- [PyEnv](https://github.com/pyenv/pyenv)
- [Pipx](https://github.com/pypa/pipx)
- [Poetry](https://python-poetry.org/)

```bash
pipx install 'git+https://github.com/unoplat/[email protected]#subdirectory=unoplat-code-confluence'
pipx install poetry
```

2. **Run the Ingestion Utility**
## Installation

### 1. Python Setup

```bash
unoplat-code-confluence --config /path/to/your/config.json
pyenv install 3.12.1
pyenv global 3.12.1
```

### 2. Install Unoplat Code Confluence

3. **Example Run**
```bash
pipx install --python $(pyenv which python) 'git+https://github.com/unoplat/[email protected]#subdirectory=unoplat-code-confluence'
```

## Configuration

<img src={require('../../static/img/code-confluence-parsing-ingestion.png').default} alt="Unoplat Code Confluence Output" className="zoomable" />
### JSON Configuration

After running the ingestion utility, you'll find the generated markdown file in the specified output directory. The file will contain a comprehensive summary of your codebase. Also the summary and other relevant metadata would be stored in the graph database.
#### Configuration Fields

Also check out the Neo4j Browser to visualize the graph database. Go to [http://localhost:7474/browser/](http://localhost:7474/browser/)
1. **repositories** (Required): Array of repositories to analyze
- `git_url`: URL of the Git repository
- `output_path`: Local directory where analysis results will be stored
- `codebases`: Array of codebases within the repository
- `codebase_folder_name`: Name of the folder containing the codebase
- `root_package_name`: Root package name (optional for some languages)
- `programming_language_metadata`: Language-specific configuration
- `language`: Programming language (currently supports "python")
- `package_manager`: Package manager type ("poetry" or "pip")
- `language_version`: Version of the programming language

<img src={require('../../static/img/code-confluence-neo4j-browser.png').default} alt="Unoplat Code Confluence Graph Database" className="zoomable" />
2. **archguard** (Required): Configuration for ArchGuard tool
- `download_url`: URL to download ArchGuard from
- `download_directory`: Local directory to store ArchGuard

## 3. Setup Chat Interface
3. **logging_handlers** (Required): Array of logging configurations
- `sink`: Log file path
- `format`: Log message format
- `rotation`: Log file rotation size
- `retention`: Log retention period
- `level`: Logging level

### Query Engine Configuration
#### Example Configuration

```json
{
"sentence_transformer_model": "jinaai/jina-embeddings-v3",
"neo4j_uri": "bolt://localhost:7687",
"neo4j_username": "neo4j",
"neo4j_password": "your neo4j password",
"provider_model_dict": {
"model_provider": "openai/gpt-4o-mini",
"model_provider_args": {
"api_key": "your openai api key",
"max_tokens": 500,
"temperature": 0.0
"repositories": [
{
"git_url": "https://github.com/unoplat/unoplat-code-confluence",
"output_path": "/Users/jayghiya/Documents/unoplat",
"codebases": [
{
"codebase_folder_name": "unoplat-code-confluence",
"root_package_name": "unoplat_code_confluence",
"programming_language_metadata": {
"language": "python",
"package_manager": "poetry",
"language_version": "3.12.0"
}
}
]
}
}
],
"archguard": {
"download_url": "archguard/archguard",
"download_directory": "/Users/jayghiya/Documents/unoplat"
},
"logging_handlers": [
{
"sink": "~/Documents/unoplat/app.log",
"format": "<green>{time:YYYY-MM-DD at HH:mm:ss}</green> | <level>{level}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | <magenta>{thread.name}</magenta> - <level>{message}</level>",
"rotation": "10 MB",
"retention": "10 days",
"level": "DEBUG"
}
]
}
```

> **Note**: As of now for `sentence_transformer_model`, only Hugging Face sentence embedding models with dimensions up to 4096 are supported. Dimensions' upper limit is due to Neo4j vector index limitations. Make sure your chosen model meets these requirements.
### Launch Query Engine
### Environment Variables

1. **Installation**
Create a `.env.dev` file where you intend to run the project:

```bash
pipx install 'git+https://github.com/unoplat/[email protected]#subdirectory=unoplat-code-confluence-query-engine'
```env
UNOPLAT_ENV=dev
UNOPLAT_DEBUG=true
UNOPLAT_GITHUB_TOKEN=Your_Github_Pat_Token
```

2. **Run the Query Engine**
### Running the Application

```bash
unoplat-code-confluence-query-engine --config /path/to/your/config.json
unoplat-code-confluence --config /path/to/your/config.json
```

3. **Example Run**


<img src={require('../../static/img/code-confluence-query-engine.png').default} alt="Unoplat Code Confluence Query Engine" className="zoomable" />

We had added [textgrad](https://github.com/zou-group/textgrad) to our graph database in the configuration of ingestion utility. You can now chat with the codebase. To view existing codebases press ctrl + e.

<img src={require('../../static/img/code-confluence-existing-codebases.png').default} alt="Unoplat Code Confluence Existing Codebases" className="zoomable" />

## Troubleshooting

Loading

0 comments on commit 28db23e

Please sign in to comment.