Phoneme Segment

Introduction

This project is a simplified version of SOFA (Singing-Oriented Forced Aligner) and only provides phoneme boundary segmentation. It can be used when SOFA's results are not accurate enough. After segmenting the boundaries, you can manually input the phonemes in vlabeler.

Usage

Environment Setup

Use git clone to download the repository code.
Install conda or use venv.
Go to the PyTorch website to install torch.
(Optional, for faster wav file reading) Install torchaudio from the PyTorch website.
Install other Python libraries:
```
pip install -r requirements.txt
```

Training

Follow the steps above to set up the environment. It is recommended to install torchaudio for faster binarization speed.
Run python convert_ds.py --data_zip_path xxx.zip --lang xx to convert the nnsvs dataset into the diffsinger dataset. Conversion needs to be done separately by language. Supported languages can be found in convert_ds.py.

Place the training data in the data folder in the following format:

- data
    - full_label
        - singer1
            - wavs
                - audio1.wav
                - audio2.wav
                - ...
            - transcriptions.csv
        - singer2
            - wavs
                - ...
            - transcriptions.csv

Modify binarize_config.yaml as needed, then run python binarize.py.
Modify train_config.yaml as needed, then run python train.py. If you want to resume training, use python train.py -r.
For training visualization, use: tensorboard --logdir=ckpt/.

Inference

Prepare the audio files to be segmented and place them in a folder (default is the /segments folder) in the following format:

- segments
    - singer1
        - segment1.wav
        - segment2.wav
        - ...
    - singer2
        - segment1.wav
        - ...

Inference via Command Line

Run python infer.py for inference.

Required parameters:
- --ckpt: (mandatory) Path to the model weights.
- --folder: Folder containing the data to be aligned (default is segments).
```
python infer.py -c checkpoint_path -f segments_path
```
Obtain the Final Annotations A .lab file with the same name will be generated in the folder containing the audio files.

Name		Name	Last commit message	Last commit date
Latest commit History 437 Commits
ckpt		ckpt
configs		configs
data		data
dictionary		dictionary
modules		modules
nnsvs-db-converter		nnsvs-db-converter
.gitignore		.gitignore
LICENSE		LICENSE
README.MD		README.MD
README_zh.MD		README_zh.MD
binarize.py		binarize.py
convert_ds.py		convert_ds.py
dataset.py		dataset.py
evaluate.py		evaluate.py
example.png		example.png
infer.py		infer.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phoneme Segment

Introduction

Usage

Environment Setup

Training

Inference

About

Releases

Packages

Languages

License

ILG2021/phoneme-segment

Folders and files

Latest commit

History

Repository files navigation

Phoneme Segment

Introduction

Usage

Environment Setup

Training

Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages