-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intern2 habana #489
Intern2 habana #489
Conversation
changing view to shape in split_qvk changing view to shape in split_qvk changing view to shape in split_qvk internlm2 habana compatibility
@@ -144,20 +144,23 @@ def __init__( | |||
) | |||
|
|||
def split_qkv(self, qkv: torch.Tensor): | |||
seq_len = qkv.shape[0] | |||
batch_size, seq_len, _ = qkv.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break all non-HPU accelerators. Can we make this more generic? E.g. pass *qkv.shape[:-1]
into reshape, so that 2D tensors will work as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, let me add and test on non-hpu
|
||
if self.tp_size > 1: | ||
splitter = partial(split_tensor_along_last_dim, | ||
num_partitions=self.tp_size) | ||
q = splitter(q)[self.tp_rank] | ||
k = splitter(k)[self.tp_rank] | ||
v = splitter(v)[self.tp_rank] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yapf fails on this empty line, please remove
@@ -144,27 +144,33 @@ def __init__( | |||
) | |||
|
|||
def split_qkv(self, qkv: torch.Tensor): | |||
seq_len = qkv.shape[0] | |||
# Unpack all dimensions except the last one | |||
*batch_dims, last_dim = qkv.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last_dim seems that is not used, please replace with _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Trigger Jenkins Tests" failure can be ignored now
Making sure the model runs on habana devices. Original code did not run due to error in the split_qkv code as param unpacking was assuming lack of batch dimension. Tested inference with the changes and InternLM2 works on Gaudi2 as expected.