Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

206 ingestion utility all class func metadata and func to func call ingestion into neo4j #212

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
1b865c4
feat: upgraded arcguard to 2.2.3
JayGhiya Nov 6, 2024
4cf6294
feat: added parameters as part of function chapi metadata which was m…
JayGhiya Nov 6, 2024
97fd429
feat: added comments for class and function
JayGhiya Nov 6, 2024
ef79bf8
refactor: moved dspy pipelines to proper package
JayGhiya Nov 6, 2024
8653cf1
refactor: naming change for confluence class to internal method
JayGhiya Nov 6, 2024
2a8427e
feat: added extensible rich schema for neomodel
JayGhiya Nov 7, 2024
513c38f
feat: added comments based description to improve summarisation and q…
JayGhiya Nov 7, 2024
f9a6cd2
feat: added ability to read package manager based on programming lang…
JayGhiya Nov 9, 2024
cc246fd
feat: added reference to package manager metadata from source codebas…
JayGhiya Nov 10, 2024
49cd0a8
fix: read/parse poetry better
JayGhiya Nov 12, 2024
efe5c54
feat: moved to requirements-parser lib for parsing requirements file …
JayGhiya Nov 12, 2024
5c9a5f2
feat: added support for pip
JayGhiya Nov 12, 2024
dd30032
fix: handle different types of setup scripts in setup.py
JayGhiya Nov 12, 2024
8f6faeb
feat: enabled qualified name while parsing classes and better structu…
JayGhiya Nov 12, 2024
6018817
feat: Added support for qualified names of packages
JayGhiya Nov 12, 2024
f6577a7
feat: added support for repo name and repo url as metadata
JayGhiya Nov 12, 2024
749d054
fix: removed redundant field
JayGhiya Nov 12, 2024
8991012
feat: added requirement_utils
JayGhiya Nov 13, 2024
eb94644
chore: added gitignore
JayGhiya Nov 13, 2024
d4c5671
feat: added stdlist module to figure out system modules of python
JayGhiya Nov 13, 2024
03bd901
feat: added parsing and consolidating imports through ast and filling…
JayGhiya Nov 13, 2024
5b56731
chore: structs
JayGhiya Nov 13, 2024
5ccbea3
feat: added ability to read versions if not able to figure out from p…
JayGhiya Nov 13, 2024
7613f5a
fix: support falling back to explicit python version from user if not…
JayGhiya Nov 13, 2024
177f9ff
feat: added support for system import reading
JayGhiya Nov 14, 2024
501ff7c
feat: revamped entire code to do programming language related custom …
JayGhiya Nov 14, 2024
d6f2b3c
feat: upgraded upstream archguard to 2.2.7 which fixes parameters of …
JayGhiya Nov 15, 2024
80cc203
chore: added pycache to gitignore
JayGhiya Nov 22, 2024
d691bd6
feat: added ability to segregate imports and improved context based o…
JayGhiya Nov 22, 2024
207a055
chore: removed tests that are not needed
JayGhiya Nov 22, 2024
c23c9f1
feat: segregated secrets and normal json config through base model an…
JayGhiya Nov 23, 2024
4440955
feat: introduce new data models and refactor existing ones for enhanc…
JayGhiya Dec 7, 2024
bd9bec6
feat: enhance Python parsing with tree-sitter, new metadata parsers, …
JayGhiya Dec 9, 2024
1a174bc
feat: add function call cleaning method and new tests for sorting fun…
JayGhiya Dec 9, 2024
d1d2afb
feat: add example config and enhance type checking. included robust c…
JayGhiya Dec 18, 2024
48fd591
chore: updated gitignore
JayGhiya Dec 18, 2024
b24403a
feat: update dependencies and enhance code structureAdded black (v23.…
JayGhiya Dec 19, 2024
1f4e370
feat: update linting rules, refactor tests, add class name validator,…
JayGhiya Dec 19, 2024
a32aff6
chore: updated isort and added updated gitignore
JayGhiya Dec 19, 2024
099fe5d
chore: updated ruff with __init__ rule and added init wherever required
JayGhiya Dec 19, 2024
b2a754a
fix: handle multi path for codebases inside monorepo
JayGhiya Dec 20, 2024
19b1345
fix: only pass root package for parsing
JayGhiya Dec 20, 2024
9eee27c
fix: pass content of file to parse global variables
JayGhiya Dec 20, 2024
27df325
chore: formatting
JayGhiya Dec 20, 2024
a4bf5dc
chore: debug project directory structure
JayGhiya Dec 20, 2024
6b2afcd
chore: formatting for tests direcotry
JayGhiya Dec 20, 2024
994c1e4
chore: solving merge issues
JayGhiya Dec 20, 2024
53ff5a9
chore: updated gitignore
JayGhiya Dec 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,58 @@ unoplat-code-confluence-query-engine/unoplat_code_confluence_query_engine/__pyca
unoplat-code-confluence-query-engine/dspy/__pycache__
experiments
unoplat-code-confluence-commons/dist
unoplat-code-confluence/unoplat_code_confluence/llm_pipelines/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_manager/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/python/package_manager/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/python/package_manager/pip/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/python/package_manager/poetry/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/python/package_manager/utils/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/python/package_naming/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_manager/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_manager/python/pip/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_manager/python/poetry/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_manager/python/utils/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_naming/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/package_naming/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/qualified_name/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/qualified_name/python/__pycache__
unoplat-code-confluence/tests/__pycache__
unoplat-code-confluence/tests/language_custom_parsing/__pycache__
unoplat-code-confluence/tests/language_custom_parsing/import_segregation/__pycache__
unoplat-code-confluence/tests/language_custom_parsing/import_segregation/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/import_segregation/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/import_segregation/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/language_custom_parsing/import_segregation/utils/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/python/pip/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/python/poetry/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/python/utils/__pycache__
unoplat-code-confluence/tests/parser/__pycache__
unoplat-code-confluence/tests/parser/python/__pycache__
unoplat-code-confluence/tests/unoplat_code_confluence/parser/python/__pycache__
unoplat-code-confluence/unoplat_code_confluence/confluence_git/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/pip/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/poetry/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/package_manager/utils/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/utils/__pycache__
unoplat-code-confluence/tests/confluence_git/__pycache__
unoplat-code-confluence/tests/parser/python/node_variables/__pycache__
unoplat-code-confluence/tests/parser/python/in_class_dependency/__pycache__
unoplat-code-confluence/tests/parser/python/function_calls/__pycache__
unoplat-code-confluence/unoplat_code_confluence/data_models/chapi/__pycache__
unoplat-code-confluence/unoplat_code_confluence/data_models/chapi_forge/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/confluence_tree_sitter/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/function_calls/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/in_class_dependency/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/node_variables/__pycache__
unoplat-code-confluence/tests/parser/python/function_metadata/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/python/function_metadata/__pycache__
unoplat-code-confluence/unoplat_code_confluence/parser/tree_sitter/__pycache__
unoplat-code-confluence/unoplat_code_confluence/data_models/forge_summary/__pycache__
unoplat-code-confluence/tests/utility/__pycache__
unoplat-code-confluence/.env.dev
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ class ContainsRelationship(StructuredRel):
"""Relationship for representing containment between nodes"""
pass

class AnnotatedRelationship(StructuredRel):
"""Relationship for representing annotation on nodes and methods"""
position = JSONProperty()

class CallsRelationship(StructuredRel):
"""Represents a method call from one method to another."""
parameters = JSONProperty()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from neomodel import StructuredNode, StringProperty, RelationshipFrom, ZeroOrMore, JSONProperty
from neomodel import StructuredNode, StringProperty, Relationship, ZeroOrMore, JSONProperty

from unoplat_code_confluence_commons.graph_models.base_models import AnnotatedRelationship


class ConfluenceAnnotation(StructuredNode):
name = StringProperty(required=True)
key_values = JSONProperty()
position = JSONProperty()
# Relationships
annotated_classes = RelationshipFrom('.confluence_class.ConfluenceClass', 'HAS_ANNOTATION', cardinality=ZeroOrMore)
annotated_methods = RelationshipFrom('.confluence_method.ConfluenceMethod', 'HAS_ANNOTATION', cardinality=ZeroOrMore)
annotated_classes = Relationship('.confluence_class.ConfluenceClass', 'HAS_ANNOTATION', model=AnnotatedRelationship, cardinality=ZeroOrMore)
annotated_methods = Relationship('.confluence_internal_method.ConfluenceInternalMethod', 'HAS_ANNOTATION', model=AnnotatedRelationship, cardinality=ZeroOrMore)
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .base_models import BaseNode, ContainsRelationship
from neomodel import RelationshipFrom, RelationshipTo, StringProperty,ZeroOrMore,One,ArrayProperty,VectorIndex,FloatProperty,JSONProperty
from unoplat_code_confluence_commons.graph_models.base_models import BaseNode, ContainsRelationship, AnnotatedRelationship
from neomodel import RelationshipTo, StringProperty,ZeroOrMore,One,ArrayProperty,FloatProperty,JSONProperty,Relationship

class ConfluenceClass(BaseNode):
"""Represents a class in a package"""
Expand All @@ -15,10 +15,11 @@ class ConfluenceClass(BaseNode):
multiple_extend = ArrayProperty(StringProperty())
position = JSONProperty()
content = StringProperty()
comments_description = StringProperty()
# Class relationships
package = RelationshipTo('.confluence_package.ConfluencePackage', 'BELONGS_TO', model=ContainsRelationship, cardinality=One)
methods = RelationshipTo('.confluence_method.ConfluenceMethod', 'CONTAINS', model=ContainsRelationship, cardinality=ZeroOrMore)
methods = RelationshipTo('.confluence_internal_method.ConfluenceInternalMethod', 'CONTAINS', model=ContainsRelationship, cardinality=ZeroOrMore)
extends = RelationshipTo('.confluence_class.ConfluenceClass', 'EXTENDS', cardinality=ZeroOrMore)
imports = RelationshipTo('.confluence_import.ConfluenceImport', 'IMPORTS', cardinality=ZeroOrMore)
annotations = RelationshipTo('.confluence_annotation.ConfluenceAnnotation', 'HAS_ANNOTATION', cardinality=ZeroOrMore)
fields = RelationshipTo('.confluence_class_field.ConfluenceClassField', 'CONTAINS', model=ContainsRelationship, cardinality=ZeroOrMore)
imports = Relationship('.confluence_import.ConfluenceImport', 'IMPORTS', cardinality=ZeroOrMore)
annotations = Relationship('.confluence_annotation.ConfluenceAnnotation', 'HAS_ANNOTATION', model=AnnotatedRelationship, cardinality=ZeroOrMore)
fields = RelationshipTo('.confluence_class_field.ConfluenceClassField', 'CONTAINS', model=ContainsRelationship, cardinality=ZeroOrMore)
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
from neomodel import StructuredNode, StringProperty, RelationshipTo, ZeroOrMore

from unoplat_code_confluence_commons.graph_models.base_models import AnnotatedRelationship


class ConfluenceClassField(StructuredNode):
field_type = StringProperty()
field_name = StringProperty()
annotations = RelationshipTo('.confluence_annotation.ConfluenceAnnotation', 'HAS_ANNOTATION', cardinality=ZeroOrMore)
annotations = RelationshipTo('.confluence_annotation.ConfluenceAnnotation', 'HAS_ANNOTATION', model=AnnotatedRelationship, cardinality=ZeroOrMore)
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
FloatProperty

)
from .base_models import BaseNode, ContainsRelationship
from unoplat_code_confluence_commons.graph_models.base_models import BaseNode, ContainsRelationship

class ConfluenceCodebase(BaseNode):
"""Represents a codebase in the system"""
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from neomodel import StructuredNode, StringProperty,RelationshipTo,ZeroOrMore

class ConfluenceExternalLibrary(StructuredNode):
"""Represents a external library in a method"""
library_name = StringProperty(unique_index=True, required=True)
library_version = StringProperty()
library_doc_url = StringProperty()
description = StringProperty()
contains = RelationshipTo('.confluence_external_method.ConfluenceExternalMethod', 'CONTAINS', cardinality=ZeroOrMore)
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@


from neomodel import StructuredNode, StringProperty, RelationshipTo, One, ZeroOrMore
from unoplat_code_confluence_commons.graph_models.base_models import CallsRelationship
from unoplat_code_confluence_commons.graph_models.confluence_method_type import MethodTypeChoices

class ConfluenceExternalMethod(StructuredNode):
"""Represents a external method in a method"""
function_name = StringProperty(unique_index=True, required=True)
return_type = StringProperty()
method_type = StringProperty(choices=MethodTypeChoices.choices,default=MethodTypeChoices.EXTERNAL)
called_by = RelationshipTo('.confluence_internal_method.ConfluenceInternalMethod', 'CALLED_BY', model=CallsRelationship, cardinality=ZeroOrMore)
library = RelationshipTo('.confluence_external_lib.ConfluenceExternalLibrary', 'BELONGS_TO', cardinality=One)
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from neomodel import StructuredNode, StringProperty, ArrayProperty, RelationshipFrom, ZeroOrMore
from neomodel import StructuredNode, StringProperty, ArrayProperty, Relationship, ZeroOrMore

class ConfluenceImport(StructuredNode):
source = StringProperty(required=True)

usage_names = ArrayProperty(StringProperty())

imported_by = RelationshipFrom('.confluence_class.ConfluenceClass', 'IMPORTS', cardinality=ZeroOrMore)
imported_by = Relationship('.confluence_class.ConfluenceClass', 'IMPORTS', cardinality=ZeroOrMore)
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from unoplat_code_confluence_commons.graph_models.base_models import BaseNode, ContainsRelationship, CallsRelationship, AnnotatedRelationship
from neomodel import RelationshipTo, StringProperty,One,ArrayProperty,FloatProperty,ZeroOrMore,IntegerProperty,JSONProperty,Relationship

class ConfluenceInternalMethod(BaseNode):
"""Represents a method in a class"""

function_name = StringProperty(required=True)
return_type = StringProperty()
implementation_summary = StringProperty(default="")
objective = StringProperty(default="")
function_objective_embedding = ArrayProperty(FloatProperty())
function_implementation_summary_embedding = ArrayProperty(FloatProperty())
content = StringProperty()
body_hash = IntegerProperty()
local_variables = JSONProperty()
comments_description = StringProperty()
# # Method relationships
confluence_class = RelationshipTo('.confluence_class.ConfluenceClass', 'BELONGS_TO', model=ContainsRelationship, cardinality=One)
annotations = Relationship('.confluence_annotation.ConfluenceAnnotation', 'HAS_ANNOTATION', model=AnnotatedRelationship, cardinality=ZeroOrMore)
calls_methods = RelationshipTo('.confluence_internal_method.ConfluenceInternalMethod', 'CALLS', model=CallsRelationship, cardinality=ZeroOrMore)
calls_external_methods = RelationshipTo('.confluence_external_method.ConfluenceExternalMethod', 'CALLS', model=CallsRelationship, cardinality=ZeroOrMore)

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from pydantic import BaseModel
from typing import ClassVar, Dict

class MethodTypeChoices(BaseModel):
"""Defines method type choices for use in Neomodel properties."""

EXTERNAL: ClassVar[str] = 'external'
UTILITY: ClassVar[str] = 'utility'

choices: ClassVar[Dict[str, str]] = {
EXTERNAL: 'External',
UTILITY: 'Programming Language Utility'
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .base_models import BaseNode, ContainsRelationship
from neomodel import RelationshipFrom, RelationshipTo, StringProperty,ZeroOrMore,One,ArrayProperty,VectorIndex,FloatProperty
from unoplat_code_confluence_commons.graph_models.base_models import BaseNode, ContainsRelationship
from neomodel import RelationshipTo, StringProperty,ZeroOrMore,One,ArrayProperty,VectorIndex,FloatProperty

class ConfluencePackage(BaseNode):
"""Represents a package in the codebase"""
Expand Down
2 changes: 1 addition & 1 deletion unoplat-code-confluence/.isort.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ import_heading_stdlib = Standard Library
import_heading_thirdparty = Third Party
import_heading_firstparty = First Party
import_heading_localfolder = Local
py_version = 311 # For Python 3.12
py_version = 311 # For Python 3.12
47 changes: 47 additions & 0 deletions unoplat-code-confluence/config.dev.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"repositories": [
{
"git_url": "https://github.com/unoplat/unoplat-code-confluence",
"markdown_output_path": "/Users/jayghiya/Documents/unoplat",
"codebases": [
{
"codebase_folder_name": "unoplat-code-confluence",
"root_package_name": "unoplat_code_confluence",
"programming_language_metadata": {
"language": "python",
"package_manager": "poetry",
"language_version": "3.12.0"
}
}
]
}
],
"archguard": {
"download_url": "archguard/archguard",
"download_directory": "/Users/jayghiya/Documents/unoplat"
},
"databases": [
{
"name": "neo4j",
"uri": "bolt://localhost:7687"
}
],
"llm_provider_config": {
"model_provider": "openai/gpt-4o-mini",
"model_provider_args": {
"max_tokens": 500,
"temperature": 0.0
}
},
"logging_handlers": [
{
"sink": "~/Documents/unoplat/app.log",
"format": "<green>{time:YYYY-MM-DD at HH:mm:ss}</green> | <level>{level}</level> | <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | <magenta>{thread.name}</magenta> - <level>{message}</level>",
"rotation": "10 MB",
"retention": "10 days",
"level": "DEBUG"
}
],
"json_output": false,
"sentence_transformer_model": "jinaai/jina-embeddings-v3"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@

import os
import json
from datetime import datetime

from unoplat_code_confluence.configuration.settings import ProgrammingLanguage
from unoplat_code_confluence.parser.tree_sitter.code_confluence_tree_sitter import CodeConfluenceTreeSitter


def test():
"""Test to print AST structure for analysis."""
code = """
def run_scan(self) -> str: # Get total number of files in run_scan self.total_files = self.file_counter.count_files() logger.info("Starting scan...") command = [ "java", "-jar", self.jar_path, "--with-function-code", f"--language={self.language}", "--output=arrow", "--output=json", f"--path={self.codebase_path}", f"--output-dir={self.output_path}" ] logger.info(f"Command: {' '.join(command)}") process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) while True: output = process.stdout.readline() logger.debug(output) if output == '' and process.poll() is not None: break if output: logger.info(output.strip()) progress_value = self.parse_progress(output, total_files=self.total_files) logger.info(f"Progress: {progress_value}%") stdout, stderr = process.communicate() if process.returncode == 0: logger.info("Scan completed successfully") chapi_metadata_path = self.modify_output_filename("0_codes.json", f"{self.codebase_name}_codes.json") else: logger.error(f"Error in scanning: {stderr}") logger.info(f"Total files scanned: {self.total_files}") return chapi_metadata_path
"""


parser = CodeConfluenceTreeSitter(language=ProgrammingLanguage.PYTHON)
# Parse and get AST
tree = parser.parser.parse(bytes(code, "utf8"))

# Debug: Save AST to JSON
debug_dir = "debug_output"
os.makedirs(debug_dir, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

def node_to_dict(node):
result = {
"type": node.type,
"text": node.text.decode('utf8') if node.text else None,
"start_point": node.start_point,
"end_point": node.end_point,
}
if len(node.children) > 0:
result["children"] = [node_to_dict(child) for child in node.children]
return result

ast_dict = node_to_dict(tree.root_node)
ast_file = f"{debug_dir}/function_ast_{timestamp}.json"
with open(ast_file, "w") as f:
json.dump(ast_dict, f, indent=2)


if __name__ == "__main__":
test()
Loading
Loading