You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a simple way to do inference? Without gradio or jupyter. Looking to run this on a batch of files.
Also, there's one thing i don't understand: i'm expecting to have 3 basic parameters:
original audio (the one that will be used for cloning)
original audio transcript
target text to convert to speech
Now in the jupyter file i see the target text has to start with something from the original audio. I'm expecting to use a separate target that doesn't contain parts of the original audio.
The quality is absolutely great, but it always outputs a piece of the original audio at the start of each generated speech. Also, generation is really slow (on a 4090, and it eats up all the VRAM)
The text was updated successfully, but these errors were encountered:
Is there a simple way to do inference? Without gradio or jupyter. Looking to run this on a batch of files.
Also, there's one thing i don't understand: i'm expecting to have 3 basic parameters:
Now in the jupyter file i see the target text has to start with something from the original audio. I'm expecting to use a separate target that doesn't contain parts of the original audio.
The quality is absolutely great, but it always outputs a piece of the original audio at the start of each generated speech. Also, generation is really slow (on a 4090, and it eats up all the VRAM)
The text was updated successfully, but these errors were encountered: