Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred while processing long audio using the provided pretrained model. #48

Open
Owen1234560 opened this issue Jul 31, 2023 · 5 comments

Comments

@Owen1234560
Copy link

audio duration: 23s
error:
File "CodeTalker/main/demo.py", line 187, in test
prediction = model.predict(audio_feature, template, one_hot)
File "CodeTalker/models/stage2.py", line 133, in predict
feat_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask)
File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 5016, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.")
RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

@CengizhanYurdakul
Copy link

I encountered the same error

@Doubiiu
Copy link
Owner

Doubiiu commented Jul 31, 2023

Hi, I think the reason for this error is the pre-defined max_seq_len in models.utils.py, and you may change it into a larger number. But I am not sure about the performance in this case (longer audio), thanks for sharing some experience here😃.

@Owen1234560
Copy link
Author

Thanks for your reply. I take a try.

@aurelianocyp
Copy link

@Owen1234560 have you solved this problem? my set is like this:
def __init__(self, d_model, dropout=0.1, period=25, max_seq_len=60000)
then I trained on this set in s1 and s2. however, it also can only process 10 seconds long audio, and will raise error in 20 seconds and longer

@Doubiiu
Copy link
Owner

Doubiiu commented Jan 15, 2024

@aurelianocyp You may also modify L27 in models.stage2.py self.biased_mask = init_biased_mask(n_head = 4, max_seq_len = 600, period=args.period), by setting max_seq_len accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants