llama : refactor `src/llama.cpp` #10902

ggerganov · 2024-12-19T14:54:36Z

Attempting to split the src/llama.cpp into a few separate modules. Very work-in-progress, mainly opening this PR for people to keep track and suggest improvements as we move along. This part does not involve functional changes, just code reorganization and decoupling to make it easier to work with the codebase. The batch and KV cache abstractions and reimplementations will be done in follow-up PRs.

graph TD;
chat;
model   --> arch[<b>arch </b>];
model   --> hparams[<b>hparams </b>];
model   ----> mmap[<b>mmap </b> <br><br> llama_file <br> llama_mmap <br> llama_mlock];
model   --> vocab;
vocab   --> unicode;
adapter -.-> model;
kv_cache -.-> batch;
kv_cache -.-> cparams;
kv_cache -.-> model;
context --> adapter[<b>adapter</b> <br><br> llama_adapter_cvec <br> llama_adapter_lora];
context -.-> batch;
context --> cparams;
context --> kv_cache;
context --> model;

style adapter fill:green
style arch fill:green
style batch fill:green
style chat fill:green
style cparams fill:green
style hparams fill:green
style kv_cache fill:green
style mmap fill:green
style model fill:green
style unicode fill:green
style vocab fill:green

TODO

~~move the llama_mmaps and llama_mlocks from llama_model to llama_context?~~ (no)
change _internal suffix to _impl
model loading
quantization

ngxson · 2024-12-19T16:04:32Z

I think control_vector and lora related stuff should be re-grouped into a module, maybe called adapters (if someone has a better naming, feel free to comment). That's because they work kinda the same way, by "adding things" on top of the original cgraph.

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch 8 times, most recently from 524886b to 7ab08d5 Compare December 22, 2024 16:24

github-actions bot added examples devops improvements to build systems and github actions labels Dec 22, 2024

ggerganov force-pushed the gg/llama-refactor-0 branch 2 times, most recently from be8f568 to dcbfda1 Compare December 22, 2024 20:30

github-actions bot added the server label Dec 22, 2024

ggerganov force-pushed the gg/llama-refactor-0 branch from e1ac353 to af43dc7 Compare December 22, 2024 21:47

ggerganov added 12 commits December 23, 2024 11:46

llama : scatter llama.cpp into multiple modules (wip)

f9b0e3b

llama : control-vector -> adapter

7b5b594

llama : arch

4c5b321

llama : mmap

7eb858a

ggml-ci

ci : remove BUILD_SHARED_LIBS=OFF

52063f7

ggml-ci

llama : arch (cont)

c8669a0

ggml-ci

llama : chat

29fd7b5

ggml-ci

llama : model

ac62ce0

ggml-ci

llama : hparams

0969970

ggml-ci

llama : adapter

963fb4d

ggml-ci

examples : fix

e428393

ggml-ci

rebase

de014bc

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from af43dc7 to de014bc Compare December 23, 2024 09:52

minor

6eaea63

ggerganov force-pushed the gg/llama-refactor-0 branch from 4f987ce to d8ee2ba Compare December 23, 2024 13:16

llama : kv cache

b0d6b66

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from d8ee2ba to b0d6b66 Compare December 23, 2024 13:46

llama : impl

a7df071

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from 4816008 to a7df071 Compare December 23, 2024 15:42

llama : batch

7035c79

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from 26b5cfd to 7035c79 Compare December 23, 2024 16:43

cont

0ccae21

ggml-ci

ggerganov force-pushed the gg/llama-refactor-0 branch from ba48e37 to 0ccae21 Compare December 23, 2024 17:22

ggerganov added 2 commits December 23, 2024 21:05

llama : context

bb0b2c4

ggml-ci

minor

1e7e338

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : refactor `src/llama.cpp` #10902

llama : refactor `src/llama.cpp` #10902

ggerganov commented Dec 19, 2024 •

edited

Loading

ngxson commented Dec 19, 2024 •

edited

Loading

llama : refactor src/llama.cpp #10902

Are you sure you want to change the base?

llama : refactor src/llama.cpp #10902

Conversation

ggerganov commented Dec 19, 2024 • edited Loading

TODO

ngxson commented Dec 19, 2024 • edited Loading

llama : refactor `src/llama.cpp` #10902

llama : refactor `src/llama.cpp` #10902

ggerganov commented Dec 19, 2024 •

edited

Loading

ngxson commented Dec 19, 2024 •

edited

Loading