-
Can I run
musicnn_keras
on a CPU? Yes, the models are already trained. -
I miss a functionality. How can I get it?
musicnn_keras
is fairly simple. Feel free to expand it as you wish! Tell us if you think this new functionality is going to be useful for the rest of us. -
Why
musicnn_keras
contains vgg models? Because they are a nice baseline, and because people like to use computer vision models for spectrograms. Hence, in this repository you can findmusicnn
-based models (musically motivated convolutional neural networks) and vggs (a computer vision architecture applied to audio). -
Which is the architecture that
musicnn
-based models employ? They use a musically motivated CNN frontend, some dense layers in the mid-end, and a temporal-pooling back-end. In this jupyter notebook we provide further details about the model. -
Which is the best
musicnn_keras
layer-output to pick for transfer learning? Although we haven't run exhaustive tests, throughout our visualisations and preliminary experiments we found thetaggram
and themax_pool
layer to be the best for this purpose. Thetaggram
because it already provides high-level music information, and themax_pool
layer because it provides a relatively sparse acoustic representation of the music audio. -
Which 50-tags does the MTT model predict? These are determined by the MagnaTagATune dataset, that is used for training the MTT models: guitar, classical, slow, techno, strings, drums, electronic, rock, fast, piano, ambient, beat, violin, vocal, synth, female, indian, opera, male, singing, vocals, no vocals, harpsichord, loud, quiet, flute, woman, male vocal, no vocal, pop, soft, sitar, solo, man, classic, choir, voice, new age, dance, male voice, female vocal, beats, harp, cello, no voice, weird, country, metal, female voice, choral.
-
Which 50-tags does the MSD model predict? These are determined by the Million Song Dataset dataset, that is used for training the MSD models: rock, pop, alternative, indie, electronic, female vocalists, dance, 00s, alternative rock, jazz, beautiful, metal, chillout, male vocalists, classic rock, soul, indie rock, Mellow, electronica, 80s, folk, 90s, chill, instrumental, punk, oldies, blues, hard rock, ambient, acoustic, experimental, female vocalist, guitar, Hip-Hop, 70s, party, country, easy listening, sexy, catchy, funk, electro, heavy metal, Progressive rock, 60s, rnb, indie pop, sad, House, happy.
-
Which are the typical cases where the model fails? When the input-audio has content that is out of the 50-tags vocabulary. Although in these cases the predictions are consistent and reasonable, the model cannot predict
bass
if this tag is not part of its vocabulary. -
Why the MTT models predicts
no vocals
andno vocal
? Because the vocabulary of the model is determined by the MagnaTagATune dataset and we used it as it is. -
My model is slow, even with a GPU. Can I do something? Yes! In
./musicnn_keras/configuration.py
you can set a bigger batch size. The dafult isBATCH_SIZE = 1
, what can be slow – but safe computationally. -
What are these songs you include in the repository?
./audio/joram-moments_of_clarity-08-solipsism-59-88.mp3
is an electronic music song from the test set of the MagnaTagATune dataset.
./audio/TRWJAZW128F42760DD_test.mp3
is an instrumental Muddy Waters song-excerpt from the test set of the Million Song Dataset called Screamin' And Cryin' (Live In Warsaw 1976). -
Which audio formats does the
musicnn_keras
library support? We rely onlibrosa
to read audio files.librosa
uses soundfile and audioread for reading audio. As of v0.7,librosa
uses soundfile by default, and falls back on audioread only when dealing with codecs unsupported by soundfile (notably, MP3, and some variants of WAV). For a list of codecs supported by soundfile, see the libsndfile documentation. -
Which sampling rate, window and hop size were used to compute the log-mel spectrograms? We compute the STFT of a downsampled signal at 16kHz, with a Hanning window of length 512 (50% overlap). We use 96 mel-bands, and we apply a logarithmic compression to it (
np.log10(10000·x + 1)
). -
I love this library! How can I get in touch? Find me on twitter @elio.elioo and drop me a line! You may also contact Jordi Pons who built the original musicnn.
If you are using it for academic works, please put in a footnote with a link to the musicnn_keras repository and cite the musicnn papers:
@inproceedings{pons2018atscale,
title={End-to-end learning for music audio tagging at scale},
author={Pons, Jordi and Nieto, Oriol and Prockup, Matthew and Schmidt, Erik M. and Ehmann, Andreas F. and Serra, Xavier},
booktitle={19th International Society for Music Information Retrieval Conference (ISMIR2018)},
year={2018},
}
@inproceedings{pons2019musicnn,
title={musicnn: pre-trained convolutional neural networks for music audio tagging},
author={Pons, Jordi and Serra, Xavier},
booktitle={Late-breaking/demo session in 20th International Society for Music Information Retrieval Conference (LBD-ISMIR2019)},
year={2019},
}
If you use it for other purposes, let us know too!