Support ollama #1326

Smpests · 2024-10-26T13:30:50Z

Description

Supports Ollama, which provides local free large LLM models,for those who need to use local APIs.

Related Issues

None

Proposed Changes

1.Add ollama package in graphrag.llm;
2.Add ollama package in graphrag.query.llm;
3.Some details to make the changes works:
graphrag.llm.openai.utils.py moved to graphrag.llm.utils.py;
add new types in graphrag.config.enum.LLMType;
...

Checklist

[ √] I have tested these changes locally.
[ √] I have reviewed the code changes.
[ un] I have updated the documentation (if necessary).
[ ×] I have added appropriate unit tests (if applicable).

Additional Notes

Follow https://microsoft.github.io/graphrag/get_started with 24720 chars book.txt，graphrag index and graphrag query were passed.

Part of settings.yaml
llm:
type: ollama_chat # or azure_openai_chat
model: llama3.1:8b
model_supports_json: false # recommended if this is available for your model.
max_tokens: 12800
api_base: http://localhost:11434
concurrent_requests: 2 # the number of parallel inflight requests that may be made
embeddings:
llm:
type: ollama_embedding # or azure_openai_embedding
model: nomic-embed-text:latest
api_base: http://localhost:11434
concurrent_requests: 2 # the number of parallel inflight requests that may be made

Smpests · 2024-10-26T13:34:40Z

@Smpests please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree

JoedNgangmeni · 2024-10-28T19:49:46Z

I get the following error when running this fork:

'ollama_chat' is not a valid LLMType

JoedNgangmeni · 2024-10-28T20:15:42Z

I've updated my yaml and llm type files but am now getting this error:

How do you make sure ollama models are actually being run? I think that is the main issue.

stats.json

#####################################################
YAML: 
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: ollama_chat # or azure_openai_chat
  model: llama3.2:latest
  model_supports_json: true # recommended if this is available for your model.
  # audience: "https://cognitiveservices.azure.com/.default"
  max_tokens: 12800
  # request_timeout: 180.0
  api_base: http://localhost:11434
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 2 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  vector_store:
    type: lancedb
    db_uri: 'output/lancedb'
    collection_name: entity_description_embeddings
    overwrite: true
  # vector_store: # configuration for AI Search
    # type: azure_ai_search
    # url: <ai_search_endpoint>
    # api_key: <api_key> # if not set, will attempt to use managed identity. Expects the `Search Index Data Contributor` RBAC role in this case.
    # audience: <optional> # if using managed identity, the audience to use for the token
    # overwrite: true # or false. Only applicable at index creation time
    # collection_name: <collection_name> # the name of the collection to use
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: ollama_embedding # or azure_openai_embedding
    model: mxbai-embed-large:latest
    api_base: http://localhost:11434
    # api_version: 2024-02-15-preview
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 2 # the number of parallel inflight requests that may be made

#####################################################
From Enums.py
class LLMType(str, Enum):
    """LLMType enum class definition."""

    # Embeddings
    OpenAIEmbedding = "openai_embedding"
    AzureOpenAIEmbedding = "azure_openai_embedding"
    OllamaEmbedding = "ollama_embedding"

    # Raw Completion
    OpenAI = "openai"
    AzureOpenAI = "azure_openai"

    # Chat Completion
    OpenAIChat = "openai_chat"
    AzureOpenAIChat = "azure_openai_chat"
    OllamaChat = "ollama_chat"


    # Debug
    StaticResponse = "static_response"

    def __repr__(self):
        """Get a string representation."""
        return f'"{self.value}"'

Smpests · 2024-10-28T22:59:21Z

I've updated my yaml and llm type files but am now getting this error:

How do you make sure ollama models are actually being run? I think that is the main issue.

stats.json

#####################################################
YAML: 
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: ollama_chat # or azure_openai_chat
  model: llama3.2:latest
  model_supports_json: true # recommended if this is available for your model.
  # audience: "https://cognitiveservices.azure.com/.default"
  max_tokens: 12800
  # request_timeout: 180.0
  api_base: http://localhost:11434
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 2 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  vector_store:
    type: lancedb
    db_uri: 'output/lancedb'
    collection_name: entity_description_embeddings
    overwrite: true
  # vector_store: # configuration for AI Search
    # type: azure_ai_search
    # url: <ai_search_endpoint>
    # api_key: <api_key> # if not set, will attempt to use managed identity. Expects the `Search Index Data Contributor` RBAC role in this case.
    # audience: <optional> # if using managed identity, the audience to use for the token
    # overwrite: true # or false. Only applicable at index creation time
    # collection_name: <collection_name> # the name of the collection to use
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: ollama_embedding # or azure_openai_embedding
    model: mxbai-embed-large:latest
    api_base: http://localhost:11434
    # api_version: 2024-02-15-preview
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 2 # the number of parallel inflight requests that may be made

#####################################################
From Enums.py
class LLMType(str, Enum):
    """LLMType enum class definition."""

    # Embeddings
    OpenAIEmbedding = "openai_embedding"
    AzureOpenAIEmbedding = "azure_openai_embedding"
    OllamaEmbedding = "ollama_embedding"

    # Raw Completion
    OpenAI = "openai"
    AzureOpenAI = "azure_openai"

    # Chat Completion
    OpenAIChat = "openai_chat"
    AzureOpenAIChat = "azure_openai_chat"
    OllamaChat = "ollama_chat"


    # Debug
    StaticResponse = "static_response"

    def __repr__(self):
        """Get a string representation."""
        return f'"{self.value}"'

I not only changed the two places you mentioned, you can check files changed in this PR, or just use this branch: https://github.com/Smpests/graphrag-ollama/tree/feature/ollama-support.
below is my result:
graphrag index --root ./ragtest

graphrag query --root ./ragtest --method local --query "Who is Scrooge and what are his main relationships?"

JoedNgangmeni · 2024-10-29T19:53:06Z

was the main.py file removed purposefully from graphrag.index? This results in --init and other args not working

JoedNgangmeni · 2024-10-29T21:50:36Z

Also, did you encounter any "Error Invoking LLM" errors? Not sure im invoking it right.

Smpests · 2024-10-29T23:20:59Z

Also, did you encounter any "Error Invoking LLM" errors? Not sure im invoking it right.

I didn't.You can try change parallelization.pnum_threads with a lower value in setting.yaml according to your machine, it defaults to 50.

Smpests · 2024-10-29T23:27:17Z

was the main.py file removed purposefully from graphrag.index? This results in --init and other args not working

Please checkt this issue: #1305
Now, graphrag.index = graphrag index

JoedNgangmeni · 2024-10-30T20:54:01Z

I'm unsure why it's bugging.

logs.json
indexing-engine.log

I think the error invoking LLM (even though I set the request timeout in yaml to 12800.0), leads to this models inability to create some reports and summaries.

The below is from running python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global

Result from python -m graphrag --query "who is scrooge?" --root ./ragtest --method global :

Result from python -m graphrag query "who is scrooge?" --root ./ragtest --method global :

Smpests · 2024-11-02T02:55:42Z

I'm unsure why it's bugging.

logs.json indexing-engine.log

I think the error invoking LLM (even though I set the request timeout in yaml to 12800.0), leads to this models inability to create some reports and summaries.

The below is from running python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global

Result from python -m graphrag --query "who is scrooge?" --root ./ragtest --method global :

Result from python -m graphrag query "who is scrooge?" --root ./ragtest --method global :

Change your query command to python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global

I checked your indexing-engine.log, which model did you use？（My case was LLM3.1:8b） Maybe your model response is not a valid json, you should debug step by step or log the model response and see.

I've seen similar error, the model response like: Below is my answer: {"title": "xx"...}, then I added "Answer only JSON, without any other text." to the prompt.

drcrallen · 2024-11-04T06:17:22Z

You can get ollama working with just settings.yaml with something like this:

llm:
  type: openai_chat
  model: qwen2.5:3b-16k
  batch_max_tokens: 8191
  max_tokens: 4000
  api_base: http://127.0.0.1:11434/v1
  max_retries: 3
  model_supports_json: false
  concurrent_requests: 1

Where qwen2.5:3b-16k is created like:

FROM qwen2.5:3b-instruct-q6_K
PARAMETER num_ctx 16384

For indexing, the problem I've encountered isn't that ollama doesn't have api hooks, it is that the models produce very different results compared to OpenAI's api, many of which are not compatible. I've only gotten qwen2.5 to produce anything reasonable in the models that fit in my (paltry) 8gb card. Looking at this thread it seems like llama3.2 has had luck for people? But are there any others that people have found successful without having to do prompt engineering (aka, take the stock prompts)?

JoedNgangmeni · 2024-11-05T00:04:09Z

I'm unsure why it's bugging.
logs.json indexing-engine.log
I think the error invoking LLM (even though I set the request timeout in yaml to 12800.0), leads to this models inability to create some reports and summaries.
The below is from running python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global

Result from python -m graphrag --query "who is scrooge?" --root ./ragtest --method global :

Result from python -m graphrag query "who is scrooge?" --root ./ragtest --method global :

Change your query command to python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global

I checked your indexing-engine.log, which model did you use？（My case was LLM3.1:8b） Maybe your model response is not a valid json, you should debug step by step or log the model response and see.

I've seen similar error, the model response like: Below is my answer: {"title": "xx"...}, then I added "Answer only JSON, without any other text." to the prompt.

I was using ollama 3.2:latest. I will try running it with 3.1:8b.

Does your response mean you changed the prompt document from their repo? I'm new to their repo and this LLM space and didn't want to ruin anything. I ask because i think the model outputs a parquet file.

If your answer is yes, should I add --emit json to the prompt?

ALSO, if we want json outputs how come in your YAM file you set the model support for json to false?

Smpests · 2024-11-05T05:23:32Z

I'm unsure why it's bugging.
logs.json indexing-engine.log
I think the error invoking LLM (even though I set the request timeout in yaml to 12800.0), leads to this models inability to create some reports and summaries.
The below is from running python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global

Result from python -m graphrag --query "who is scrooge?" --root ./ragtest --method global :

Result from python -m graphrag query "who is scrooge?" --root ./ragtest --method global :

Change your query command to python -m graphrag query --query "who is scrooge?" --root ./ragtest --method global
I checked your indexing-engine.log, which model did you use？（My case was LLM3.1:8b） Maybe your model response is not a valid json, you should debug step by step or log the model response and see.
I've seen similar error, the model response like: Below is my answer: {"title": "xx"...}, then I added "Answer only JSON, without any other text." to the prompt.

I was using ollama 3.2:latest. I will try running it with 3.1:8b.

Does your response mean you changed the prompt document from their repo? I'm new to their repo and this LLM space and didn't want to ruin anything. I ask because i think the model outputs a parquet file.

If your answer is yes, should I add --emit json to the prompt?

ALSO, if we want json outputs how come in your YAM file you set the model support for json to false?

I tried with ollama 3.2:latest(on my branch feature/ollama-support), this is my result(with 16078chars in input/book.txt ):
indexing-engine.log

Yes, I modified their prompt document.(a simple guiding tip on llama3.1:8b, but i didn't change it on llama3.2:latest).
My setting.yaml set model_supports_json to false, because this parameter make no sense to ollama.

Smpests added 8 commits October 24, 2024 11:41

ollama support.

b2736a9

remove useless print code

5c9dcdb

resolve generate community report error

a98ae6a

ollama support.

82d3a07

remove useless print code

f57f4a3

resolve generate community report error

7452efc

resolve coflict

970d324

support search by ollama

84be1e5

Smpests requested review from a team as code owners October 26, 2024 13:30

Smpests changed the title ~~Feature/ollama support~~ Support ollama Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ollama #1326

Support ollama #1326

Smpests commented Oct 26, 2024

Smpests commented Oct 26, 2024

JoedNgangmeni commented Oct 28, 2024

JoedNgangmeni commented Oct 28, 2024 •

edited

Loading

Smpests commented Oct 28, 2024

JoedNgangmeni commented Oct 29, 2024

JoedNgangmeni commented Oct 29, 2024

Smpests commented Oct 29, 2024

Smpests commented Oct 29, 2024

JoedNgangmeni commented Oct 30, 2024

Smpests commented Nov 2, 2024

drcrallen commented Nov 4, 2024

JoedNgangmeni commented Nov 5, 2024 •

edited

Loading

Smpests commented Nov 5, 2024 •

edited

Loading

Support ollama #1326

Are you sure you want to change the base?

Support ollama #1326

Conversation

Smpests commented Oct 26, 2024

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Smpests commented Oct 26, 2024

JoedNgangmeni commented Oct 28, 2024

JoedNgangmeni commented Oct 28, 2024 • edited Loading

Smpests commented Oct 28, 2024

JoedNgangmeni commented Oct 29, 2024

JoedNgangmeni commented Oct 29, 2024

Smpests commented Oct 29, 2024

Smpests commented Oct 29, 2024

JoedNgangmeni commented Oct 30, 2024

Smpests commented Nov 2, 2024

drcrallen commented Nov 4, 2024

JoedNgangmeni commented Nov 5, 2024 • edited Loading

Smpests commented Nov 5, 2024 • edited Loading

JoedNgangmeni commented Oct 28, 2024 •

edited

Loading

JoedNgangmeni commented Nov 5, 2024 •

edited

Loading

Smpests commented Nov 5, 2024 •

edited

Loading