An error occurred while processing long audio using the provided pretrained model. #48

Owen1234560 · 2023-07-31T08:19:48Z

audio duration: 23s
error:
File "CodeTalker/main/demo.py", line 187, in test
prediction = model.predict(audio_feature, template, one_hot)
File "CodeTalker/models/stage2.py", line 133, in predict
feat_out = self.transformer_decoder(vertice_input, hidden_states, tgt_mask=tgt_mask, memory_mask=memory_mask)
File "/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/functional.py", line 5016, in multi_head_attention_forward
raise RuntimeError(f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}.")
RuntimeError: The shape of the 3D attn_mask is torch.Size([4, 600, 600]), but should be (4, 601, 601).

CengizhanYurdakul · 2023-07-31T10:08:29Z

I encountered the same error

Doubiiu · 2023-07-31T12:21:37Z

Hi, I think the reason for this error is the pre-defined max_seq_len in models.utils.py, and you may change it into a larger number. But I am not sure about the performance in this case (longer audio), thanks for sharing some experience here😃.

Owen1234560 · 2023-08-01T12:20:26Z

Thanks for your reply. I take a try.

aurelianocyp · 2024-01-15T08:15:30Z

@Owen1234560 have you solved this problem? my set is like this:
def __init__(self, d_model, dropout=0.1, period=25, max_seq_len=60000)
then I trained on this set in s1 and s2. however, it also can only process 10 seconds long audio, and will raise error in 20 seconds and longer

Doubiiu · 2024-01-15T12:50:05Z

@aurelianocyp You may also modify L27 in models.stage2.py self.biased_mask = init_biased_mask(n_head = 4, max_seq_len = 600, period=args.period), by setting max_seq_len accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurred while processing long audio using the provided pretrained model. #48

An error occurred while processing long audio using the provided pretrained model. #48

Owen1234560 commented Jul 31, 2023

CengizhanYurdakul commented Jul 31, 2023

Doubiiu commented Jul 31, 2023

Owen1234560 commented Aug 1, 2023

aurelianocyp commented Jan 15, 2024

Doubiiu commented Jan 15, 2024

An error occurred while processing long audio using the provided pretrained model. #48

An error occurred while processing long audio using the provided pretrained model. #48

Comments

Owen1234560 commented Jul 31, 2023

CengizhanYurdakul commented Jul 31, 2023

Doubiiu commented Jul 31, 2023

Owen1234560 commented Aug 1, 2023

aurelianocyp commented Jan 15, 2024

Doubiiu commented Jan 15, 2024