Supplementary materials to
Trowitzsch, I., Schymura, C., Kolossa, D., Obermayer, K. (2019), "Joining Sound Event Detection and Localization Through Spatial Segregation", submitted, IEEE Transactions on Audio, Speech, Language Processing, https://arxiv.org/abs/1904.00055.
- Models
- Fullstream sound event detection models
- Segregated sound event detection models
- Spatial segregation model
- Code
- for training fullstream and segregated sound event detection models
- for testing fullstream and segregated sound event detection models
- for evaluating test results and plotting graphs
- Scene parameters lists
- Feature set descriptions
The code needs the Auditory Machine Learning Training and Testing Pipeline (AMLTTP) installed to run, and AMLTTP makes use of other software modules of the Two!Ears Computational Framework. You will need to download
- https://github.com/TWOEARS/Auditory-Machine-Learning-Training-and-Testing-Pipeline
- https://github.com/TWOEARS/blackboard-system
- https://github.com/TWOEARS/auditory-front-end
- https://github.com/TWOEARS/binaural-simulator
- https://github.com/TWOEARS/SOFA
- https://github.com/TWOEARS/main
- https://github.com/TWOEARS/stream-segregation-training-pipeline
In your Two!Ears-"main" directory, please first edit TwoEarsPath.xml to point to your respective directories.
Once Matlab opened, the Two!Ears main folder needs to be added to the Matlab path. This will be accomplished by executing the following command:
>> addpath( '<path-to-your-TwoEars-Main-directory>' )
Training and testing is performed on the sounds of the NIGENS database. Please download it from
Trowitzsch, Ivo, et al (2019), "NIGENS general sound events database", Zenodo. http://doi.org/10.5281/zenodo.2535878
First edit train_fullstream and train_segId to set nigensPath
and dataCachePath
(lines 5 and 6) to your respective pathes. Then training can be executed:
>> train_segId()
>> train_fullstream()
It is, computationally, reasonable to first execute train_segId
and then train_fullstream
. The other way around will take a bit longer (any direction actually will take very long due to the necessary preprocessing: scene-rendering, segregation, feature construction, etc.; and use a lot of disk space (a few terrabytes...) for the data cache...).
It is fine to cancel preprocessing of the data at any time. Due to the caching mechanism of AMLTTP, preprocessing will be continued next time.
To check that the code actually works without having to process all data and use so much disk space, execute
>> train_segId( 9:10, 1:4, [1,11,21,31] )
>> train_fullstream( 9:10, 1:4, [1,11,21,31] )
This will train on only ten sound files, only four scenes, and only four classes. Of course the obtained models are not able to generalize reasonably.
You either need to first train models (see above), or unzip our trained models (fullstream_detection_models.zip
and segregated_detection_models.zip
, extract to directories named as the archives).
Edit gen_fullstream_testdata, gen_segId_testdata, test_fullstream and test_on_segId to set nigensPath
and dataCachePath
to your respective pathes. Then testing can be executed:
>> gen_segId_testdata()
>> gen_fullstream_testdata`()
>> test_fullstream()
>> test_on_segId()
>>
>> % Testing also with loc-error and nsrcs-error, if wanted:
>> gen_segId_testdata( [], [], [], 5, 0 ) % 5deg sigma location error
>> test_on_segId( [], [], [], 5, 0 )
>> gen_segId_testdata( [], [], [], 10, 0 ) % 10deg sigma location error
>> test_on_segId( [], [], [], 10, 0 )
>> gen_segId_testdata( [], [], [], 20, 0 ) % 20deg sigma location error
>> test_on_segId( [], [], [], 20, 0 )
>> gen_segId_testdata( [], [], [], 45, 0 ) % 45deg sigma location error
>> test_on_segId( [], [], [], 45, 0 )
>> gen_segId_testdata( [], [], [], 1000, 0 ) % random localization
>> test_on_segId( [], [], [], 1000, 0 )
>> gen_segId_testdata( [], [], [], 0, -1 ) % source count error := -1
>> test_on_segId( [], [], [], 0, -1 )
>> gen_segId_testdata( [], [], [], 0, -2 ) % source count error := -2
>> test_on_segId( [], [], [], 0, -2 )
>> gen_segId_testdata( [], [], [], 0, +1 ) % source count error := +1
>> test_on_segId( [], [], [], 0, +1 )
>> gen_segId_testdata( [], [], [], 0, +2 ) % source count error := +2
>> test_on_segId( [], [], [], 0, +2 )
It is, computationally, reasonable to first execute gen_segId_testdata
and then gen_fullstream_testdata
. The other way around will take a bit longer (any direction actually will take very long due to the necessary preprocessing: scene-rendering, segregation, feature construction, etc.; and use a lot of disk space (a few terrabytes...) for the data cache...).
It is fine to cancel preprocessing of the data at any time. Due to the caching mechanism of AMLTTP, preprocessing will be continued next time.
To check that the code actually works without having to process all data and use so much disk space, execute
>> gen_segId_testdata( 11, 1:4, [1,11,21,31] )
>> gen_fullstream_testdata`( 11, 1:4, [1,11,21,31] )
>> test_on_segId( 11, 1:4, [1,11,21,31])
>> test_fullstream( 11, 1:4, [1,11,21,31] )
This will test on only five sound files, only four scenes, and only four classes.
To run evaluation directly on the test data produced by us, just run:
>> eval_mc7_gt()
>> eval_mc7_locError()
>> eval_mc7_nsrcsError()
To run evaluation on test data produced by you (be it from our or your trained models)(but it must be on test data of all test scenes, not only the functionality check one from above), run:
>> testEval_collect( '../testdata/fullstream.test' )
>> testEval_collect( '../testdata/segId.on.segId_0-0.test' )
>> eval_mc7_gt( true )
>> % The following requires having tested also with loc-error and nsrcs-error.
>> testEval_collect( '../testdata/segId.on.segId_5-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_10-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_20-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_45-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_1000-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_0--1.test' )
>> testEval_collect( '../testdata/segId.on.segId_0--2.test' )
>> testEval_collect( '../testdata/segId.on.segId_0-1.test' )
>> testEval_collect( '../testdata/segId.on.segId_0-2.test' )
>> eval_mc7_locError( true )
>> eval_mc7_nsrcsError( true )
After the first run of eval_mc7_gt
on your data, the "true" parameter can be left away.
The contained materials are published under the GNU GENERAL PUBLIC LICENSE, Version 3.
If you use any contained material for your own work, please acknowledge our work by citing as
Trowitzsch, I., Schymura, C., Kolossa, D., Obermayer, K. (2019), "Joining Sound Event Detection and Localization Through Spatial Segregation", submitted, IEEE Transactions on Audio, Speech, Language Processing, https://arxiv.org/abs/1904.00055.
Furthermore, if you change the code and use subsequent results, please additionally cite
Trowitzsch, Ivo, et al (2019). "Auditory Machine Learning Training and Testing Pipeline: AMLTTP v3.0". Zenodo. http://doi.org/10.5281/zenodo.2575086
Thank you.