Performance improving chances in the future #614

oldcpple · 2024-09-27T02:31:25Z

Hi there, I've been following this work for a few months and found it's really an amazing idea to run LLMs over the Internet, while I'm also trying to improve Petals' performance on model inference in my local environment. My point of view is that, simply wrapping the Transformer library for inference is a little bit inefficient, since there are many optimization mechanisms for LLM serving in recent years' papers/projects, for example, Flash Attention, Paged Attention, Continuous Batching, etc. It would sound more if Petals could integrate any or a few of these optimizations. I wonder if authors have any future plan on this. I'm personally trying to integrate vLLM with Petals, or in another word, enabling vLLM to run on different nodes over the internet.

magejosh · 2024-09-29T07:46:11Z

I'm interested to know the answer on this as well. I realize that with the 1b and 3b llama 3.2 models coming out recently and there already being quantized llama 3.2 11b models available on huggingface, i can't help but think there's still room to use this to help local network users and the whole network with seeing if this project has any future development plans. I'm imagining use cases for re-using old tech devices instead of sending them to landfills. One such use case is installing petals servers on them and giving them network access. Some users will contribute to the larger pool, some will prefer to use it on a private swarm, but i'd imagine opening support for these other models and ways to use them as OP was mentioning, we could really unlock a lot of use from moving in that direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improving chances in the future #614

Performance improving chances in the future #614

oldcpple commented Sep 27, 2024 •

edited

Loading

magejosh commented Sep 29, 2024

Performance improving chances in the future #614

Performance improving chances in the future #614

Comments

oldcpple commented Sep 27, 2024 • edited Loading

magejosh commented Sep 29, 2024

oldcpple commented Sep 27, 2024 •

edited

Loading