-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #15
Comments
Oh interesting, thank you. Let me take a look |
What types were you using for finetuning? |
I used the types of
|
Got it. I think I'll need to try it myself to double-check (we transform weights fp16->fp32->bf16), but if the merged model produces reasonable output it should be ok. |
Sometimes the merged model produces the expected results. But I don't know whether the unexpected results are due to |
As I understand, you are doing finetuning on CPU? I'm not sure if there's any benefit of using fp16, if the underlying architecture doesn't support it natively. |
I'm testing fine-tuning on an Apple M1 and I know that it uses the GPU during fine-tuning. |
Can you still reproduce this after our fix in #16? When I tried it that time on apple m1 i didn't have to convert to f32 and back. |
Actually, I tried it yesterday on m2 ultra and had the same issue, I had to do the float32 conversion and that solved it. |
@Nirjhor27 Interesting! which torch version are you using? The error essentially says that fp16 operations are not implemented for CPU. On my M1/M2 devices I can do that though:
Does this snippet work for you? |
I am using 2.1.2. I understood that fp16 is for gpu and not cpu, but I am also worried if doing the conversion as Aspent suggested will mess up the weights when merging. I could merge after doing the float 32 conversion and the merged model appears to be working fine, but I have the same question as Aspen -> if it's actually okay or not. |
Interesting, maybe it has something to do with recent work in torch, e.g. pytorch/pytorch@2240018. I cannot test older torch version right now, I'll need to downgrade python as well. I'll make a change to detect if device supports fp16. Alternatively we can run the merge_lora on mps device as well. |
Thanks, will keep an eye out and update if I find an alternative than the float32 conversion. |
I suspect the result might be a little different, but not sure how big of a difference will it make. Btw, @Nirjhor27 - if you've used m2 ultra, what was the GPU utilization when you tried to finetune? Thank you! |
I haven't checked it yet (I am using a remote client) - however, I am planning/have to check it very soon when I finetune again, I will update you on that. |
In order to merge LoRA checkpoint for llama 2 7B model, I run
python merge_lora.py
.But an error occured,
So I modified the code like below and I got the merged model file.
But I wonder it's okay or not.
Can you give the opinion or right solution?
The text was updated successfully, but these errors were encountered: