You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PR #99 includes the ScatterMoE module drop-in to enable expert parallel for mixture-of-expert models, but the class currently only supports full-finetuning and LoRA.
This issue is to extend it to also support quantized peft. Currently it is incompatible with the accelerated-peft plugin on many levels
firstly the MoE models may have 3D tensors for experts, not sure if bitsandbytes support it.
since we perform a complete model swap without caring about the base layers, we will ignore the quantized modules when doing the model swap
This requires a think through. The best outcome will be compatiblity with quantized_peft
The text was updated successfully, but these errors were encountered:
The PR #99 includes the
ScatterMoE
module drop-in to enable expert parallel for mixture-of-expert models, but the class currently only supports full-finetuning and LoRA.This issue is to extend it to also support quantized peft. Currently it is incompatible with the
accelerated-peft
plugin on many levelsbitsandbytes
support it.This requires a think through. The best outcome will be compatiblity with
quantized_peft
The text was updated successfully, but these errors were encountered: