Add standalone python scripts for local usage #95

pgosar · 2024-04-17T21:34:42Z

Work in progress to create a python script to run inference for speech editing and TTS that is separate from Jupyter

Will handle #56

TODO

- Add command line arguments for all hardcoded options on TTS
- Complete and test TTS
- Add command line arguments for all hardcoded options on speech editing
- Complete and test speech editing
- cleanup, and add running instructions

arthurwolf · 2024-04-17T22:11:13Z

Did you see #34 ?

pgosar · 2024-04-17T22:14:58Z

This is planned to supersede that since I'd like to avoid attempting to do environment setup in the script itself. I also want to provide scripts for both speech editing and TTS. Will start after I finish my current PR

arthurwolf · 2024-04-17T22:16:38Z

Great, just wanted to be sure you knew about it / would re-use anything that's useful if you can/want. Good luck on your work.

…

On Thu, Apr 18, 2024 at 12:15 AM Pranay Gosar ***@***.***> wrote: This is planned to supersede that since I'd like to avoid attempting to do environment setup in the script itself. I also want to provide scripts for both speech editing and TTS. Will start after I finish my current PR — Reply to this email directly, view it on GitHub <#95 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA2SFJ42ORXK25QPUPKDBLY53X7RAVCNFSM6AAAAABGMCZJY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRSGU2DKOJSGM> . You are receiving this because you commented.Message ID: ***@***.***>

-- 勇気とユーモア

jstayco · 2024-04-23T01:33:28Z

Did you see #34 ?

Definitely feel free to use whatever you can from this to save yourself work or time! Unfortunately, I got in a spot where I couldn't dedicate more time to the script I put up. I kept getting errors with audiocraft not being found when running it and, unfortunately, Python isn't my forte, so I wasn't sure how to rectify that between the parent environment and the inner Conda environment.

You should still be able to reuse the environment setup stuff if you want especially installing Python modules conditionally and the pip stuff. I know @pgosar mentioned not doing setup stuff, but just an idea to throw out there you could put behind a --install-deps flag or something. Best of luck!

pgosar · 2024-04-23T23:55:51Z

PR should be functional now, tomorrow I will take a pass through and clean up the code a little and make sure I didn't miss any potential breakages.

Every hardcoded variable concerning inference, outputs, inputs etc. has been turned into a command line argument. They are all optional. The default values are whatever they were set to originally.

This should be merged in before my other PR #94 because I'll need to make changes to the speech editing script based on those changes

pgosar · 2024-04-24T00:39:42Z

I kept getting errors with audiocraft not being found

I don't know if this is your exact issue but when I wrote the Google Colabs I had to clone Audiocraft into the VoiceCraft folder. Regardless, my scripts work without any special environment setup beyond what's in the README currently.

pgosar · 2024-04-24T22:18:13Z

@jasonppy Hi, I should be ready on my side

jasonppy · 2024-04-25T00:00:28Z

@jasonppy Hi, I should be ready on my side

Thanks, I'll test it in the next two days

jasonppy · 2024-04-27T16:31:02Z

tts_demo.py

+align_temp = f"{temp_folder}/mfa_alignments"
+beam_size = args.beam_size
+retry_beam_size = args.retry_beam_size
+os.system("source ~/.bashrc && \


the forced alignment output is not really used because the user need to specify cut_off_sec when calling the file

jasonppy · 2024-04-27T16:38:33Z

tts_demo.py

+
+# take a look at demo/temp/mfa_alignment, decide which part of the audio to use as prompt
+cut_off_sec = args.cut_off_sec  # NOTE: according to forced-alignment file demo/temp/mfa_alignments/5895_34622_000026_000002.wav, the word "strength" stop as 3.561 sec, so we use first 3.6 sec as the prompt. this should be different for different audio
+target_transcript = args.target_transcript


add something like

cut_off_sec, cut_off_word_idx = find_closest_word_boundary(cut_off_sec, word_alignment_fn, margin) target_transcript = " ".join(orig_transcript.split(" ")[:cut_off_word_idx]) + " " + args.target_transcript

and find_closest_word_boundary will find the word end boundary (in word_alignment_fn file) that is the closest to the user specified cut_off_sec, which also has some gap (specified by margin) between the next word start boundary. And return word_end_boundary + margin/2 as the cut_off_sec. margin can be a user specified parameter

jasonppy · 2024-04-27T16:40:06Z

tts_demo.py

+    parser.add_argument("-ot", "--original_transcript", type=str,
+                        default="But when I had approached so near to them The common object, which the sense deceives, Lost not by distance any of its marks,",
+                        help="original transcript")
+    parser.add_argument("-tt", "--target_transcript", type=str,


can you make the target_transcript not a concatenation of the transcript of the prompt and the real target transcript. As the user will not be able to specify the prompt and cut-off-sec without checking MFA alignment output. a workaround is written in the comments below

jasonppy

Thanks!
The main thing I'm concerned is that the user needs to specify cut_off_sec (which is the prompt we want to cut from the input audio, and specify target_transcript as a concatenation of transcript of the prompt and real target transcript. However, one can't not do that without looking at MFA output, which is the output of calling the script. A workaround is specified in comments.

pgosar · 2024-04-30T00:38:31Z

I'll take a look at these in a day or two

pgosar · 2024-05-04T03:16:15Z

Sorry for the delay - had to complete my final projects/exams.

I implemented the `find_closest_word_boundary such that based on the specified cut off seconds, it outputs a new one that takes into account the margins. This then means that based on your suggestion about the target_transcript, the user should be able to input only what new speech they want to generate, and the cut off point of the original audio to replace.

I'm a little confused, is the behavior you want that the user can specify a target transcript only and then the script will figure out the cut off seconds? That would be quite easy to adjust my current implementation to do - all I'd need to do is search for the last matching word between the original and target transcript and set that point as the cut_off_sec and cut_off_index instead.

add files

1e0eaeb

pgosar mentioned this pull request Apr 18, 2024

How do i use this thing outsite of gradio / jupyter? #97

Open

jasonppy self-assigned this Apr 20, 2024

pgosar added 5 commits April 23, 2024 12:07

Merge branch 'jasonppy:master' into standalone

fc4de13

add TTS

63736f7

add beam size cmd args

b8bb2ab

add speech editing

59877c0

add short form commands

1850da9

pgosar marked this pull request as ready for review April 23, 2024 23:55

add simple running instructions

9fb6d94

jasonppy reviewed Apr 27, 2024

View reviewed changes

adjust cut off sec and target transcript

1a896d2

jasonppy merged commit 1a896d2 into jasonppy:master May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standalone python scripts for local usage #95

Add standalone python scripts for local usage #95

pgosar commented Apr 17, 2024 •

edited

Loading

arthurwolf commented Apr 17, 2024

pgosar commented Apr 17, 2024 •

edited

Loading

arthurwolf commented Apr 17, 2024 via email

jstayco commented Apr 23, 2024

pgosar commented Apr 23, 2024 •

edited

Loading

pgosar commented Apr 24, 2024 •

edited

Loading

pgosar commented Apr 24, 2024

jasonppy commented Apr 25, 2024

jasonppy Apr 27, 2024

jasonppy Apr 27, 2024

jasonppy Apr 27, 2024

jasonppy left a comment

pgosar commented Apr 30, 2024

pgosar commented May 4, 2024

Add standalone python scripts for local usage #95

Add standalone python scripts for local usage #95

Conversation

pgosar commented Apr 17, 2024 • edited Loading

arthurwolf commented Apr 17, 2024

pgosar commented Apr 17, 2024 • edited Loading

arthurwolf commented Apr 17, 2024 via email

jstayco commented Apr 23, 2024

pgosar commented Apr 23, 2024 • edited Loading

pgosar commented Apr 24, 2024 • edited Loading

pgosar commented Apr 24, 2024

jasonppy commented Apr 25, 2024

jasonppy Apr 27, 2024

Choose a reason for hiding this comment

jasonppy Apr 27, 2024

Choose a reason for hiding this comment

jasonppy Apr 27, 2024

Choose a reason for hiding this comment

jasonppy left a comment

Choose a reason for hiding this comment

pgosar commented Apr 30, 2024

pgosar commented May 4, 2024

pgosar commented Apr 17, 2024 •

edited

Loading

pgosar commented Apr 17, 2024 •

edited

Loading

pgosar commented Apr 23, 2024 •

edited

Loading

pgosar commented Apr 24, 2024 •

edited

Loading