English README / 👑Donate project / Discord

Music Vocal Separation Tool

This is an extremely simple tool for separating vocals and background music, completely localized for web operation, using 2stems/4stems/5stems models.

Drag and drop a song or an audio/video file with background music into the local web page, and you can separate the vocals and music into separate audio wav files. You can choose to separate "piano sound," "bass sound," "drum sound," etc.

Automatically invoke the local browser to open the local web page, and the model is built-in, no need to connect to the external network to download.

Supports video (mp4/mov/mkv/avi/mpeg) and audio (mp3/wav) formats

Just two clicks of the mouse, one to select the audio/video file, and two to start processing.

Video Demo

vocal-english.mp4

Precompiled Version Win Usage Instructions / Linux and Mac Source Deployment

Download the precompiled file from Releases on the right side.
After downloading, unzip it to a certain location, such as E:/vocal-separate.
Double-click start.exe, wait for the browser window to open automatically.
Click on the upload area on the page, find the audio/video file you want to separate in the pop-up window, or drag the audio file directly to the upload area, and then click "Separate Now." Wait a moment, and at the bottom, each separated file and the playback control will be displayed. Click to play.
If the machine has an NVIDIA GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically.

Source Code Deployment (Linux/Mac/Windows)

Requires python 3.9->3.11
Create an empty directory, such as E:/vocal-separate. Open a cmd window in this directory, the method is to enter cmd in the address bar, and then press Enter.

Use git to pull the source code to the current directory git clone git@github.com:jianchang512/vocal-separate.git .
Create a virtual environment python -m venv venv
Activate the environment. On Windows, the command is %cd%/venv/scripts/activate, and on Linux and Mac, the command is source ./venv/bin/activate
Install dependencies: pip install -r requirements.txt
On Windows, unzip ffmpeg.7z and place ffmpeg.exe and ffprobe.exe in the project directory. On Linux and Mac, download the corresponding version of ffmpeg from ffmpeg official website, unzip it, and place the ffmpeg and ffprobe binary programs in the project root directory.
Download the model compression package, located in the root directory of the project in pretrained_models folder, and after extraction,pretrained_models will be three folders namely 2steps/3steps/5steps
Execute python start.py, and wait for the local browser window to open automatically.

API

api url: http://127.0.0.1:9999/api

Method: POST

Request params:

file: audio file

model: model name, 2stems,4stems,5stems

Response: json code:int, 0 succeed，>0 is error

msg:str,  error infomation

data: List[str], all wav separate result, eg. ['http://127.0.0.1:9999/static/files/2/accompaniment.wav']

status_text: dict[str,str], every wav name, {'accompaniment.wav': 'accompaniment audio', 'bass.wav': 'bass audio', 'drums.wav': 'drums audio', 'other.wav': 'other audio', 'piano.wav': 'piano audio', 'vocals.wav': 'vocals audio'}

import requests
# api url
url = "http://127.0.0.1:9999/api"
files = {"file": open("C:\\Users\\c1\\Videos\\2.wav", "rb")}
data={"model":"2stems"}
response = requests.request("POST", url, timeout=600, data=data,files=files)
print(response.json())

{'code': 0, 'data': ['http://127.0.0.1:9999/static/files/2/accompaniment.wav', 'http://127.0.0.1:9999/static/files/2/vocals.wav'], 'msg': 'ok
', 'status_text': {'accompaniment.wav': 'accompaniment', 'bass.wav': 'bass', 'drums.wav': 'drums', 'other.wav': 'other', 'piano.wav': 'piano', 'vocals.wav': 'vocals'}}

CUDA Acceleration Support

Install CUDA Toolkit

If your computer has an Nvidia graphics card, upgrade the graphics card driver to the latest version, and then go to install the corresponding CUDA Toolkit 11.8 and cudnn for CUDA11.X.

After the installation is complete, press Win + R, enter cmd, and then press Enter. In the popped up window, enter nvcc --version to confirm that there is version information displayed, similar to the picture

Then continue to enter nvidia-smi, confirm that there is output information, and you can see the CUDA version number, similar to the picture

Notes

For Chinese music or Chinese musical instruments, it is recommended to choose the 2stems model. Other models can separately extract files for "piano, bass, and drums."
If the computer does not have an NVIDIA graphics card or has not configured the CUDA environment, do not choose the 4stems and 5stems models, especially when processing long-duration audio, otherwise, it may run out of memory.

Acknowledgments

This project mainly relies on other projects

https://github.com/deezer/spleeter
https://github.com/pallets/flask
https://ffmpeg.org/
https://layui

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Music Vocal Separation Tool

Video Demo

Precompiled Version Win Usage Instructions / Linux and Mac Source Deployment

Source Code Deployment (Linux/Mac/Windows)

API

CUDA Acceleration Support

Notes

Acknowledgments

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Music Vocal Separation Tool

Video Demo

Precompiled Version Win Usage Instructions / Linux and Mac Source Deployment

Source Code Deployment (Linux/Mac/Windows)

API

CUDA Acceleration Support

Notes

Acknowledgments