Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using WaveformToFbankConverter with variable sample rates is impossible #341

Open
avidale opened this issue Feb 19, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@avidale
Copy link
Contributor

avidale commented Feb 19, 2024

Describe the bug:
The converter seems to stick to the first sample rate that has been fed into it, and refuse to convert audios with any other speech rates.

Describe how to reproduce:

import torch
from fairseq2.data.audio import WaveformToFbankConverter

# Because the two converters are initialized identically, I expect them to behave identically
converter1 = WaveformToFbankConverter()
converter2 = WaveformToFbankConverter()

# Define two equivalent audios; the second is the first, downsampled.
input1 = {
    "waveform": torch.randn([2, 90_000]),
    "sample_rate": 48000,
    "format": -1,
}
input2 = {
    "waveform": input1['waveform'][:, ::3],
    "sample_rate": 16000,
    "format": -1,
}

converted1_1 = converter1(input1)
converted2_2 = converter2(input2)
# the above conversions work fine, just as expected

# expect the same output as converted2_2
converted1_2 = converter1(input2) 
# ValueError: The input waveform must have a sample rate of 48000, but has a sample rate of 16000 instead.

# expect the same output as converted1_1
converted2_1 = converter2(input1) 
# ValueError: The input waveform must have a sample rate of 16000, but has a sample rate of 48000 instead.

Describe the expected behavior:
This implicit dependence of the first input is not expected; a more appropriate behavior would be either to explicitly specify the desired sample rate when initializing the converter, or to support inputs with any speech rate.

Environment:
At the very least, specify the versions of fairseq2, PyTorch, Python, and CUDA along with your operating system and, if relevant, GPU model.
I am using python3.8, fairseq2==0.2.0, pytorch 2.1.1+cu118. But I believe this is irrelevant.

Additional Context:
Add any other context about the bug here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant