Skip to content

so-vits-svc fork with realtime support, improved interface and more features.

License

Notifications You must be signed in to change notification settings

Autumn-Sisfa/so-vits-svc-fork

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SoftVC VITS Singing Voice Conversion Fork

็ฎ€ไฝ“ไธญๆ–‡

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

A fork of so-vits-svc with realtime support and greatly improved interface. Based on branch 4.0 (v1) (or 4.1) and the models are compatible.

Features not available in the original repo

  • Realtime voice conversion (enhanced in v1.1.0)
  • Integrates QuickVC
  • Fixed misuse of ContentVec in the original repository.1
  • More accurate pitch estimation using CREPE.
  • GUI and unified CLI available
  • ~2x faster training
  • Ready to use just by installing with pip.
  • Automatically download pretrained models. No need to install fairseq.
  • Code completely formatted with black, isort, autoflake etc.

Installation

Option 1. One click easy installation

Download .bat

This BAT file will automatically perform the steps described below.

Option 2. Manual installation (using pipx, experimental)

1. Installing pipx

Windows (development version required due to pypa/pipx#940):

py -3 -m pip install --user git+https://github.com/pypa/pipx.git
py -3 -m pipx ensurepath

Linux/MacOS:

python -m pip install --user pipx
python -m pipx ensurepath

2. Installing so-vits-svc-fork

pipx install so-vits-svc-fork --python=3.10
pipx inject so-vits-svc-fork torch torchaudio --pip-args="--upgrade" --index-url=https://download.pytorch.org/whl/cu118 # https://download.pytorch.org/whl/nightly/cu121

Option 3. Manual installation

Creating a virtual environment

Windows:

py -3.10 -m venv venv
venv\Scripts\activate

Linux/MacOS:

python3.10 -m venv venv
source venv/bin/activate

Anaconda:

conda create -n so-vits-svc-fork python=3.10 pip
conda activate so-vits-svc-fork

Installing without creating a virtual environment may cause a PermissionError if Python is installed in Program Files, etc.

Install this via pip (or your favourite package manager that uses pip):

python -m pip install -U pip setuptools wheel
pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu118 # https://download.pytorch.org/whl/nightly/cu121
pip install -U so-vits-svc-fork
Notes
  • If no GPU is available or using MacOS, simply remove pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu118. MPS is probably supported.
  • If you are using an AMD GPU on Linux, replace --index-url https://download.pytorch.org/whl/cu118 with --index-url https://download.pytorch.org/whl/nightly/rocm5.6. AMD GPUs are not supported on Windows (#120).

Update

Please update this package regularly to get the latest features and bug fixes.

pip install -U so-vits-svc-fork
# pipx upgrade so-vits-svc-fork

Usage

Inference

GUI

GUI

GUI launches with the following command:

svcg

CLI

  • Realtime (from microphone)
svc vc
  • File
svc infer source.wav

Pretrained models are available on Hugging Face or CIVITAI.

Notes

  • If using WSL, please note that WSL requires additional setup to handle audio and the GUI will not work without finding an audio device.
  • In real-time inference, if there is noise on the inputs, the HuBERT model will react to those as well. Consider using realtime noise reduction applications such as RTX Voice in this case.
  • Models other than for 4.0v1 or this repository are not supported.
  • GPU inference requires at least 4 GB of VRAM. If it does not work, try CPU inference as it is fast enough. 2

Training

Before training

  • If your dataset has BGM, please remove the BGM using software such as Ultimate Vocal Remover. 3_HP-Vocal-UVR.pth or UVR-MDX-NET Main is recommended. 3
  • If your dataset is a long audio file with a single speaker, use svc pre-split to split the dataset into multiple files (using librosa).
  • If your dataset is a long audio file with multiple speakers, use svc pre-sd to split the dataset into multiple files (using pyannote.audio). Further manual classification may be necessary due to accuracy issues. If speakers speak with a variety of speech styles, set --min-speakers larger than the actual number of speakers. Due to unresolved dependencies, please install pyannote.audio manually: pip install pyannote-audio.
  • To manually classify audio files, svc pre-classify is available. Up and down arrow keys can be used to change the playback speed.

Cloud

Open In Colab Open In Paperspace Paperspace Referral4

If you do not have access to a GPU with more than 10 GB of VRAM, the free plan of Google Colab is recommended for light users and the Pro/Growth plan of Paperspace is recommended for heavy users. Conversely, if you have access to a high-end GPU, the use of cloud services is not recommended.

Local

Place your dataset like dataset_raw/{speaker_id}/**/{wav_file}.{any_format} (subfolders and non-ASCII filenames are acceptable) and run:

svc pre-resample
svc pre-config
svc pre-hubert
svc train -t

Notes

  • Dataset audio duration per file should be <~ 10s.
  • Need at least 4GB of VRAM. 5
  • It is recommended to increase the batch_size as much as possible in config.json before the train command to match the VRAM capacity. Setting batch_size to auto-{init_batch_size}-{max_n_trials} (or simply auto) will automatically increase batch_size until OOM error occurs, but may not be useful in some cases.
  • To use CREPE, replace svc pre-hubert with svc pre-hubert -fm crepe.
  • To use ContentVec correctly, replace svc pre-config with -t so-vits-svc-4.0v1. Training may take slightly longer because some weights are reset due to reusing legacy initial generator weights.
  • To use MS-iSTFT Decoder, replace svc pre-config with svc pre-config -t quickvc.
  • Silence removal and volume normalization are automatically performed (as in the upstream repo) and are not required.
  • If you have trained on a large, copyright-free dataset, consider releasing it as an initial model.
  • For further details (e.g. parameters, etc.), you can see the Wiki or Discussions.

Further help

For more details, run svc -h or svc <subcommand> -h.

> svc -h
Usage: svc [OPTIONS] COMMAND [ARGS]...

  so-vits-svc allows any folder structure for training data.
  However, the following folder structure is recommended.
      When training: dataset_raw/{speaker_name}/**/{wav_name}.{any_format}
      When inference: configs/44k/config.json, logs/44k/G_XXXX.pth
  If the folder structure is followed, you DO NOT NEED TO SPECIFY model path, config path, etc.
  (The latest model will be automatically loaded.)
  To train a model, run pre-resample, pre-config, pre-hubert, train.
  To infer a model, run infer.

Options:
  -h, --help  Show this message and exit.

Commands:
  clean          Clean up files, only useful if you are using the default file structure
  infer          Inference
  onnx           Export model to onnx (currently not working)
  pre-classify   Classify multiple audio files into multiple files
  pre-config     Preprocessing part 2: config
  pre-hubert     Preprocessing part 3: hubert If the HuBERT model is not found, it will be...
  pre-resample   Preprocessing part 1: resample
  pre-sd         Speech diarization using pyannote.audio
  pre-split      Split audio files into multiple files
  train          Train model If D_0.pth or G_0.pth not found, automatically download from hub.
  train-cluster  Train k-means clustering
  vc             Realtime inference from microphone

External Links

Video Tutorial

Contributors โœจ

Thanks goes to these wonderful people (emoji key):

34j
34j

๐Ÿ’ป ๐Ÿค” ๐Ÿ“– ๐Ÿ’ก ๐Ÿš‡ ๐Ÿšง ๐Ÿ‘€ โš ๏ธ โœ… ๐Ÿ“ฃ ๐Ÿ›
GarrettConway
GarrettConway

๐Ÿ’ป ๐Ÿ› ๐Ÿ“– ๐Ÿ‘€
BlueAmulet
BlueAmulet

๐Ÿค” ๐Ÿ’ฌ ๐Ÿ’ป ๐Ÿšง
ThrowawayAccount01
ThrowawayAccount01

๐Ÿ›
็ท‹
็ท‹

๐Ÿ“– ๐Ÿ›
Lordmau5
Lordmau5

๐Ÿ› ๐Ÿ’ป ๐Ÿค” ๐Ÿšง ๐Ÿ’ฌ ๐Ÿ““
DL909
DL909

๐Ÿ›
Satisfy256
Satisfy256

๐Ÿ›
Pierluigi Zagaria
Pierluigi Zagaria

๐Ÿ““
ruckusmattster
ruckusmattster

๐Ÿ›
Desuka-art
Desuka-art

๐Ÿ›
heyfixit
heyfixit

๐Ÿ“–
Nerdy Rodent
Nerdy Rodent

๐Ÿ“น
่ฐขๅฎ‡
่ฐขๅฎ‡

๐Ÿ“–
ColdCawfee
ColdCawfee

๐Ÿ›
sbersier
sbersier

๐Ÿค” ๐Ÿ““ ๐Ÿ›
Meldoner
Meldoner

๐Ÿ› ๐Ÿค” ๐Ÿ’ป
mmodeusher
mmodeusher

๐Ÿ›
AlonDan
AlonDan

๐Ÿ›
Likkkez
Likkkez

๐Ÿ›
Duct Tape Games
Duct Tape Games

๐Ÿ›
Xianglong He
Xianglong He

๐Ÿ›
75aosu
75aosu

๐Ÿ›
tonyco82
tonyco82

๐Ÿ›
yxlllc
yxlllc

๐Ÿค” ๐Ÿ’ป
outhipped
outhipped

๐Ÿ›
escoolioinglesias
escoolioinglesias

๐Ÿ› ๐Ÿ““ ๐Ÿ“น
Blacksingh
Blacksingh

๐Ÿ›
Mgs. M. Thoyib Antarnusa
Mgs. M. Thoyib Antarnusa

๐Ÿ›
Exosfeer
Exosfeer

๐Ÿ› ๐Ÿ’ป
guranon
guranon

๐Ÿ› ๐Ÿค” ๐Ÿ’ป
Alexander Koumis
Alexander Koumis

๐Ÿ’ป
acekagami
acekagami

๐ŸŒ
Highupech
Highupech

๐Ÿ›
Scorpi
Scorpi

๐Ÿ’ป
Maximxls
Maximxls

๐Ÿ’ป
Star3Lord
Star3Lord

๐Ÿ› ๐Ÿ’ป
Forkoz
Forkoz

๐Ÿ› ๐Ÿ’ป
Zerui Chen
Zerui Chen

๐Ÿ’ป ๐Ÿค”
Roee Shenberg
Roee Shenberg

๐Ÿ““ ๐Ÿค” ๐Ÿ’ป
Justas
Justas

๐Ÿ› ๐Ÿ’ป
Onako2
Onako2

๐Ÿ“–

This project follows the all-contributors specification. Contributions of any kind welcome!

Footnotes

  1. #206 โ†ฉ

  2. #469 โ†ฉ

  3. https://ytpmv.info/how-to-use-uvr/ โ†ฉ

  4. If you register a referral code and then add a payment method, you may save about $5 on your first month's monthly billing. Note that both referral rewards are Paperspace credits and not cash. It was a tough decision but inserted because debugging and training the initial model requires a large amount of computing power and the developer is a student. โ†ฉ

  5. #456 โ†ฉ

About

so-vits-svc fork with realtime support, improved interface and more features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.9%
  • Jupyter Notebook 3.4%
  • Batchfile 1.5%
  • Other 0.2%