-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Bamba natively on Pytorch #2
Comments
@ani300 it may be possible that slow path is failing for SDPA, see the stack trace below
(vllm-bamba) nmg@css-host-181 nmg$ ./fmwork/github.ibm.com/hcir/v2.0/inference/transformers/dev/driver -m $css22/nmg/models/__cos/9aeedd4bd01c49a2a4a3dcc889904f70/ibm-llm-input/flim/Avengers-Bamba-9B-HF -i 128 -o 128 -b 2 -r 3
The fast path is not available because on of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 19.50it/s]
Traceback (most recent call last):
File "/net/storage149/mnt/md0/nmg/./fmwork/github.ibm.com/hcir/v2.0/inference/transformers/dev/driver", line 49, in
dts = fmwork.loop(par.reps, model.generate, kwargs)
File "/net/storage149/mnt/md0/nmg/fmwork/github.ibm.com/hcir/v2.0/inference/transformers/dev/fmwork.py", line 71, in loop
function(**kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/transformers/generation/utils.py", line 2231, in generate
result = self._sample(
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/transformers/generation/utils.py", line 3222, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/transformers/models/bamba/modeling_bamba.py", line 1600, in forward
outputs = self.model(
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/transformers/models/bamba/modeling_bamba.py", line 1424, in forward
layer_outputs = decoder_layer(
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/transformers/models/bamba/modeling_bamba.py", line 1171, in forward
hidden_states, self_attn_weights, cache_output = self.self_attn(
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-bamba/lib/python3.10/site-packages/transformers/models/bamba/modeling_bamba.py", line 612, in forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The expanded size of the tensor (130) must match the existing size (129) at non-singleton dimension 3. Target sizes: [2, 32, 2, 130]. Tensor sizes: [2, 1, 1, 129]
|
Other seqlen issues @ani300 :
|
Impl issues: @ani300
|
@fabianlim @ani300 Initial FMS implementation can be found here (which uses the slow path). https://github.com/foundation-model-stack/foundation-model-stack/tree/bamba. There is still a bug with rope (something to do with weight adaptation to fms), will let you know when fixed. |
This issue tracks progress on running Bamba natively on Pytorch.
Success for this issue implies the following:
cc @raghukiran1224 @fabianlim @AdnanHoque
The text was updated successfully, but these errors were encountered: