Ablation study? #5

jukofyork · 2024-07-27T11:53:49Z

It would be interesting to see the effect of each optimisation separately.

Some of the ideas might be very easy to use in other inference engines and others much harder to implement, so it would be very helpful to get an idea of the gains for each feature separately.

I've linked this to the llama.cpp discussions:

ggerganov/llama.cpp#8721

As the "Arithmetic Intensity Guided Offloading" is clearly something they could do quite easily and likely give a significant boost to MoE models.

The text was updated successfully, but these errors were encountered:

james0zan · 2024-07-27T13:29:50Z

Yes, we will present more results later. Actually, our vision for KTransformers is to serve as an experimental platform used to develop prototypes faster. The more mature ones can then be adopted into more popular inference engines like llama.cpp and vLLM.

Merge website code

Atream pushed a commit that referenced this issue Aug 7, 2024

Merge pull request #5 from kvcache-ai/feature-add-website

5af4849

Merge website code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ablation study? #5

Ablation study? #5

jukofyork commented Jul 27, 2024

james0zan commented Jul 27, 2024

Ablation study? #5

Ablation study? #5

Comments

jukofyork commented Jul 27, 2024

james0zan commented Jul 27, 2024