Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized op implementation: quantized_decomposed::dequantize_per_channel_group #7676

Open
sheetalarkadam opened this issue Jan 15, 2025 · 4 comments
Assignees
Labels
module: kernels Issues related to kernel libraries, e.g. portable kernels and optimized kernels module: quantization rfc Request for comment and feedback on a post, proposal, etc.

Comments

@sheetalarkadam
Copy link

🚀 The feature, motivation and pitch

The op quantized_decomposed::dequantize_per_channel_group implementation would help in using executorch's quantized CPU ops in models like llama3

Came across this issue when trying to build llama3 without XNNPack. More details here #6975

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

@mcr229 mcr229 added rfc Request for comment and feedback on a post, proposal, etc. module: kernels Issues related to kernel libraries, e.g. portable kernels and optimized kernels module: quantization labels Jan 15, 2025
@mcr229
Copy link
Contributor

mcr229 commented Jan 15, 2025

@kimishpatel @manuelcandales is this something we are planning on supporting eventually?

@kimishpatel
Copy link
Contributor

Can you say more on the use case? I agree in general that this is useful to add. But it is likely gonna be slower without xnnpack so just trying to get more context to figure out how to prioritize this

@sheetalarkadam
Copy link
Author

I can't think of production usecases as of now. Currently, we are trying measure how much of a boost xnnpack gives. If the performance boost significant, it makes sense to migrate workflows not using xnnpack to use them. But I understand if it will be a lower priority task. My usecase mainly is for getting stats around performance.

@mcr229
Copy link
Contributor

mcr229 commented Jan 16, 2025

@sheetalarkadam generally performance without xnnpack will be bad. It's generally not a fair comparison between no-xnnpack --> xnnpack, because XNNPACK is the primary cpu accelerator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: kernels Issues related to kernel libraries, e.g. portable kernels and optimized kernels module: quantization rfc Request for comment and feedback on a post, proposal, etc.
Projects
None yet
Development

No branches or pull requests

4 participants