Author: Mortadha Manai
Report Link :https://github.com/MortadhaMannai/VOCAL-TRACK-EXTRACTION-USING-NEURAL-NETWORKS/blob/main/Report.pdf
Paper links :
1- zendo.org : https://zenodo.org/record/8274725
2- OpenAir.com : https://explore.openaire.eu/search/publication?pid=10.5281%2Fzenodo.8267702&fbclid=IwAR13OfUARkpyVk1jzk2fFoqaxVeNz2xbDwNySsu8vCV0FxwslG0eI8hqx90
There are four models in this project: Deep Clustering Model, Hybrid Deep Clustering Model, U-net Model and UH-net Model. Models are trained on DSD100 dataset. The project is based on PyTorch.
-
Data preprocess:
Build_Dataset.ipynb
: generate dataset from DSD100config.py
: define project-level parametersdata_loader.py
: define torch loadermel_dealer.py
: convert music file to melspectrogram and convert spectrogram back
-
Model defination:
unet_model.py
: define U-net Model and UH-net Modelcluster_model.py
: define Deep Clustering Modelhybrid_model.py
: define Hybrid Deep Clustering Model
-
Model training:
utils.py
: define loss functionsunet_train.py
: train functions for u-net / uh-net modelhd_train.py
: train functions for hybrid deep clustering modeldc_train.py
: train functions for deep clustering modeltrain_dc.ipynb
,train_hybrid.ipynb
andtrain_unet.ipynb
: train models
-
Model evaluation:
evaluation.py
: define evaluation functionsmusic_decoder.py
: retrieve audio file from model outputs
Original Music ( Vocal Track)
==> Hybrid Deep Clustering Model
==> U-net Model
==> UH-net Model
- Masked Power Spectrograms:
- Generated Masks: