Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

How to get the output from one .wav file #58

Open
saikishor opened this issue Oct 11, 2017 · 9 comments
Open

How to get the output from one .wav file #58

saikishor opened this issue Oct 11, 2017 · 9 comments
Assignees

Comments

@saikishor
Copy link

I would like to know, is there any method to directly parse the wav file and get the output of the model as a text file, without using any manifest file.

@p8778ter
Copy link

I have the same question. We can use this "python evaluate.py --manifest val:/path/to/manifest.csv --model_file /path/to/saved_model.prm --inference_file /path/to/outputfile_pickle file" and only write one wav file in manifest file. However, manifest file requires 2nd data element - the transcript file.

You have to know the wav content to generate the transcript file.

However, if I don't know and just want to parse the wav file, how should I do?

@saikishor
Copy link
Author

@p8778ter I am also waiting for that. I guess it is difficult to do this way, because the dataloader and some functions are out of insight, for me to even try to modify it and achieve the results

@p8778ter
Copy link

I also tried to modify evaluate.py. I used a dummy transcript file as a place holder in manifest file and use the pre-trained model just do predict, no validation. I got the result, but it is terribly wrong. I need Intel deep speech team to give us some insight.

@saikishor
Copy link
Author

yes yes the results are terrible i don't know why

@tyler-nervana
Copy link
Contributor

I've created a branch tyler/evaluate_single with a script called evaluate_single.py in it. I've tested it briefly and it seems to work as expected for files from the librispeech's dev-clean dataset.

To use it, first checkout my branch. Then, from within the speech directory you should be able to run: ./evaluate_single.py --model_file <model_file> <sequence of audio files>. For instance, ./evaluate_single.py --model_file /data/librispeech_16_epochs.prm /data/LibriSpeech/dev-clean/1272/128104/1272-128104-*.flac. It should print out something like:

File: /data/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.flac                             
Transcript: NOR IS MISTER QOLTERS MANNER LESS INTEESTING THUN HIS MATTER

If this works well for you, I'll make a PR and get it in shortly.

@p8778ter
Copy link

p8778ter commented Oct 20, 2017

I tried this evaluate_single.py. It works.

However the cer is still high if sound was encoded with 32K frequency. Using 16K frequency is better. The cer of 16K is about 60%.

One sentence should be "that the principle concerns that our central bank has had for a number of years most visibly since" predict as "THATTE PRINCE VO ONCERN THAT THE PAERSENTRAL BANK AS HADRN UMBER BEAR ESAMOST VISIBLY SINSAKE"

Another issue is the predicted sentence is trimmed off and output is very short. I input a 60 seconds sound file, it only generated about 5 seconds. Could we make the prediction output longer or do we have to split the 60 seconds into 12 small pieces?

Another 2 important questions are:

  1. What are the best audio encoding parameters?

  2. If I use MY data to continue train librispeech_16_epochs.prm, could it improve the cer?

Thanks

@tyler-nervana
Copy link
Contributor

Thanks for the feedback! I forgot to mention that the sample rate was being hardcoded to 16k as it was in librispeech. At the moment, Aeon (the dataloader) doesn't support variable sample rates, and so it must be provided. There are a few other encoding restrictions that you can find here: http://aeon.nervanasys.com/index.html/provider_audio.html.

Trimming the sound should produce better results. We found that longer duration clips are generally much harder for the network.

Adding your own data for training should help things. Librispeech is a very specific style of read-speech, which can make it difficult to generalize to different types of speech.

@binglel
Copy link

binglel commented May 1, 2018

Hello,Where can I find evaluate_single.py? There is no such file now.
@tyler-nervana
Thank you!

@tyler-nervana
Copy link
Contributor

Hi, you can find the file in a branch here: https://github.com/NervanaSystems/deepspeech/blob/tyler/evaluate_single/speech/evaluate_single.py.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants