Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ablation study? #5

Open
jukofyork opened this issue Jul 27, 2024 · 1 comment
Open

Ablation study? #5

jukofyork opened this issue Jul 27, 2024 · 1 comment

Comments

@jukofyork
Copy link

It would be interesting to see the effect of each optimisation separately.

Some of the ideas might be very easy to use in other inference engines and others much harder to implement, so it would be very helpful to get an idea of the gains for each feature separately.

I've linked this to the llama.cpp discussions:

ggerganov/llama.cpp#8721

As the "Arithmetic Intensity Guided Offloading" is clearly something they could do quite easily and likely give a significant boost to MoE models.

@james0zan
Copy link
Member

Yes, we will present more results later. Actually, our vision for KTransformers is to serve as an experimental platform used to develop prototypes faster. The more mature ones can then be adopted into more popular inference engines like llama.cpp and vLLM.

Atream pushed a commit that referenced this issue Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants