Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improving chances in the future #614

Open
oldcpple opened this issue Sep 27, 2024 · 1 comment
Open

Performance improving chances in the future #614

oldcpple opened this issue Sep 27, 2024 · 1 comment

Comments

@oldcpple
Copy link

oldcpple commented Sep 27, 2024

Hi there, I've been following this work for a few months and found it's really an amazing idea to run LLMs over the Internet, while I'm also trying to improve Petals' performance on model inference in my local environment. My point of view is that, simply wrapping the Transformer library for inference is a little bit inefficient, since there are many optimization mechanisms for LLM serving in recent years' papers/projects, for example, Flash Attention, Paged Attention, Continuous Batching, etc. It would sound more if Petals could integrate any or a few of these optimizations. I wonder if authors have any future plan on this. I'm personally trying to integrate vLLM with Petals, or in another word, enabling vLLM to run on different nodes over the internet.

@magejosh
Copy link

I'm interested to know the answer on this as well. I realize that with the 1b and 3b llama 3.2 models coming out recently and there already being quantized llama 3.2 11b models available on huggingface, i can't help but think there's still room to use this to help local network users and the whole network with seeing if this project has any future development plans. I'm imagining use cases for re-using old tech devices instead of sending them to landfills. One such use case is installing petals servers on them and giving them network access. Some users will contribute to the larger pool, some will prefer to use it on a private swarm, but i'd imagine opening support for these other models and ways to use them as OP was mentioning, we could really unlock a lot of use from moving in that direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants