Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline from config file #220

Merged
merged 67 commits into from
Dec 12, 2024
Merged

Conversation

stellasia
Copy link
Contributor

@stellasia stellasia commented Nov 27, 2024

Description

Adds the ability to instantiate a Pipeline from a config file. Used to simplify the internal implementation of SimpleKGPipeline (without any API change).

Supports JSON and YAML file format. Example config for SimpleKGPipeline

    version_: 1
    template_: SimpleKGPipeline
    neo4j_config:
        params_:
            uri: bolt://
            user: neo4j
            password:
              resolver_: ENV
              var_: NEO4J_PASSWORD
    llm_config:
      class_: OpenAILLM
      params_:
        model_name: gpt-4o
        api_key:
          resolver_: ENV
          var_: OPENAI_API_KEY
        model_params:
          temperature: 0
          max_tokens: 2000
          response_format:
            type: json_object
    embedder_config:

The goal of this implementation was to be able to use the same objects when user writes Python code and when he is using a config file that needs to be parsed to instantiate objects such as neo4j driver and LLMs, without duplicating code.

The main additions are

  • neo4j_graphrag.experimental.pipeline.config.runner.PipelineRunner class, able to create a pipeline from an AbstractPipelineConfig (code) object or config file
  • PipelineConfigWrapper is the 'router': depending on the template_ field, it will instantiate either a PipelineConfig or SimpleKGPipelineConfig. In the former case, all cmponents and connections need to be defined in the config. In the later, shortcuts are possible (same interface as the SimpleKGPipeline).
  • The other things to note:
    • ParamToResolveConfig: used to define parameter in config file other than providing its direct value. For instance, to read Neo4j password from env variables, user can write:

       "password": {
           "resolver_": "ENV",
           "var_": "NEO4J_PASSWORD"
        }
      

      Only two "resolvers" are supported for now:

      • Read from env
      • Read from another key in the config file. The later is useful to reference a neo4j driver (defined in the top level key neo4j_config) later in the config file (in a component definition) (note that neo4j_config is a top level config because it can be reused in multiple components)
    • Neo4jDriver-, LLM- and Embeddings- Type: wrapper around an already instantiated object (when the config is used in code) and a *Config object that contains the relevant field to do the instantiation (when used from config file). The parse method always return an object. For instance:

       driver_type = Neo4jDriverType(driver)
       driver = driver_type.parse()
      
       driver_type = Neo4jDriverType(
           Neo4jDriverConfig(
               params_={
                   "uri": "bolt://",
                   "user": "",
                   "password": "",
               }
           )
       )
       driver = driver_type.parse()

Changes

  • SimpleKGPipeline internals changed to use the new PipelineRunner.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Complexity: High

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

Copy link
Contributor

@willtai willtai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@stellasia stellasia merged commit ff6862e into neo4j:main Dec 12, 2024
7 checks passed
@stellasia stellasia deleted the feature/config-files branch January 6, 2025 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants