CSC 486B (Deep Learning for Computer Vision) Group Project (with Quinton Yong and Jingjing Zhu): An Exploration and Implementation of “Learning to Separate Object Sounds by Watching Unlabelled Video”
We use a reduced dataset (from the AudioSet Dataset) of 4000 samples with the following 4 instrument classes: drum, acoustic guitar, piano and violin.
We have attached a directory containing the results of the audio source separation. We included results of 3 different videos for WithDropout vs WithoutDropout comparison. The source-separated WAV files and the original 10-second mp4 clip are provided. We also include the source-separation results of one video using hard-coded ground truth labels.