Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : refactor src/llama.cpp #10902

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from
Draft

llama : refactor src/llama.cpp #10902

wants to merge 19 commits into from

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Dec 19, 2024

Attempting to split the src/llama.cpp into a few separate modules. Very work-in-progress, mainly opening this PR for people to keep track and suggest improvements as we move along. This part does not involve functional changes, just code reorganization and decoupling to make it easier to work with the codebase. The batch and KV cache abstractions and reimplementations will be done in follow-up PRs.

graph TD;
chat;
model   --> arch[<b>arch </b>];
model   --> hparams[<b>hparams </b>];
model   ----> mmap[<b>mmap </b> <br><br> llama_file <br> llama_mmap <br> llama_mlock];
model   --> vocab;
vocab   --> unicode;
adapter -.-> model;
kv_cache -.-> batch;
kv_cache -.-> cparams;
kv_cache -.-> model;
context --> adapter[<b>adapter</b> <br><br> llama_adapter_cvec <br> llama_adapter_lora];
context -.-> batch;
context --> cparams;
context --> kv_cache;
context --> model;

style adapter fill:green
style arch fill:green
style batch fill:green
style chat fill:green
style cparams fill:green
style hparams fill:green
style kv_cache fill:green
style mmap fill:green
style model fill:green
style unicode fill:green
style vocab fill:green
Loading

TODO

  • move the llama_mmaps and llama_mlocks from llama_model to llama_context? (no)
  • change _internal suffix to _impl
  • model loading
  • quantization

@ngxson
Copy link
Collaborator

ngxson commented Dec 19, 2024

I think control_vector and lora related stuff should be re-grouped into a module, maybe called adapters (if someone has a better naming, feel free to comment). That's because they work kinda the same way, by "adding things" on top of the original cgraph.

@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch 8 times, most recently from 524886b to 7ab08d5 Compare December 22, 2024 16:24
@github-actions github-actions bot added examples devops improvements to build systems and github actions labels Dec 22, 2024
@ggerganov ggerganov force-pushed the gg/llama-refactor-0 branch 2 times, most recently from be8f568 to dcbfda1 Compare December 22, 2024 20:30
ggml-ci
ggml-ci
ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions examples server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants