-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ollama #1326
base: main
Are you sure you want to change the base?
Support ollama #1326
Conversation
@microsoft-github-policy-service agree |
I get the following error when running this fork: 'ollama_chat' is not a valid LLMType |
I not only changed the two places you mentioned, you can check files changed in this PR, or just use this branch: https://github.com/Smpests/graphrag-ollama/tree/feature/ollama-support. |
was the main.py file removed purposefully from graphrag.index? This results in --init and other args not working |
Also, did you encounter any "Error Invoking LLM" errors? Not sure im invoking it right. |
I didn't.You can try change parallelization.pnum_threads with a lower value in setting.yaml according to your machine, it defaults to 50. |
Please checkt this issue: #1305 |
I'm unsure why it's bugging. I think the error invoking LLM (even though I set the request timeout in yaml to 12800.0), leads to this models inability to create some reports and summaries. The below is from running Result from Result from |
Change your query command to I checked your indexing-engine.log, which model did you use?(My case was LLM3.1:8b) Maybe your model response is not a valid json, you should debug step by step or log the model response and see. I've seen similar error, the model response like: Below is my answer: {"title": "xx"...}, then I added "Answer only JSON, without any other text." to the prompt. |
You can get ollama working with just llm:
type: openai_chat
model: qwen2.5:3b-16k
batch_max_tokens: 8191
max_tokens: 4000
api_base: http://127.0.0.1:11434/v1
max_retries: 3
model_supports_json: false
concurrent_requests: 1 Where
For indexing, the problem I've encountered isn't that ollama doesn't have api hooks, it is that the models produce very different results compared to OpenAI's api, many of which are not compatible. I've only gotten qwen2.5 to produce anything reasonable in the models that fit in my (paltry) 8gb card. Looking at this thread it seems like llama3.2 has had luck for people? But are there any others that people have found successful without having to do prompt engineering (aka, take the stock prompts)? |
I was using ollama 3.2:latest. I will try running it with 3.1:8b. Does your response mean you changed the prompt document from their repo? I'm new to their repo and this LLM space and didn't want to ruin anything. I ask because i think the model outputs a parquet file. If your answer is yes, should I add --emit json to the prompt? ALSO, if we want json outputs how come in your YAM file you set the model support for json to false? |
I tried with ollama 3.2:latest(on my branch feature/ollama-support), this is my result(with 16078chars in input/book.txt ): Yes, I modified their prompt document.(a simple guiding tip on llama3.1:8b, but i didn't change it on llama3.2:latest). |
Description
Supports Ollama, which provides local free large LLM models,for those who need to use local APIs.
Related Issues
None
Proposed Changes
1.Add ollama package in graphrag.llm;
2.Add ollama package in graphrag.query.llm;
3.Some details to make the changes works:
graphrag.llm.openai.utils.py moved to graphrag.llm.utils.py;
add new types in graphrag.config.enum.LLMType;
...
Checklist
Additional Notes
Follow https://microsoft.github.io/graphrag/get_started with 24720 chars book.txt,graphrag index and graphrag query were passed.
Part of settings.yaml
llm:
type: ollama_chat # or azure_openai_chat
model: llama3.1:8b
model_supports_json: false # recommended if this is available for your model.
max_tokens: 12800
api_base: http://localhost:11434
concurrent_requests: 2 # the number of parallel inflight requests that may be made
embeddings:
llm:
type: ollama_embedding # or azure_openai_embedding
model: nomic-embed-text:latest
api_base: http://localhost:11434
concurrent_requests: 2 # the number of parallel inflight requests that may be made