This repository contains the MATLAB code corresponding to the submitted paper "Loudspeaker beamforming to enhance speech recognition performance of voice driven applications" (Accepted for ICASSP 2025, link will be added once available.)
Run example.m
. This sets:
- the
room
object: where are the loudspeakers, the listener, etc. - the
settings
object: this object controls the settings, such as the number of integration points in the quadrature method. - the
par_meas
object: this object is used for the perceptual measure. See also the corresponding paper by Van de Par et al. and my code
After setting the objects the loudspeaker playback signals are computed by calling the loudspeaker spotformer.
While not part of the loudspeaker spotformer itself, the performance of the loudspeaker spotformer is evaluated by considering:
- The audio quality in the neighbourhood of the listener, measured using ViSQOLAudio;
- The energy reduction achieved in the neighbourhood of the listener (compared to the reference playback file)
- The intelligibility at the output of the microphones. The microphone algorithm is either a single microphone (nearest neighbour), a microphone spotformer or an MPDR/MVDR beamformer.
Separate code for the beamformers can be found here (MPDR/MVDR) and here (microphone spotformer).
- CVX for MATLAB (I used Version 2.2, build 1148).
- I used MOSEK (Version 9.1.9) as solver, but I expect other solvers to work as well.
- MATLABs signal processing toolbox
- MATLABs audio toolbox
- The room-impulse response generator by E. Habets. A version compiled for my system (Ubuntu) is provided.
The code was tested on Matlab R2024a on Ubuntu 23.10.
None at the moment :)
The examples make use of the room-impulse response generator from E. Habets (MIT license). You might need to compile this for your system. The sound excerpt is taken from the movie 'Sprite Fight' by Blender Studio (Creative Commons Attribution 1.0 License). The numerical integration is performed using the Fast Clenshaw-Curtis quadrature by G. von Winckel, published under a permissive license. The voice commands are obtained from the LibriSpeech ASR corpus and published under a CC-BY-SA 4.0 license. All licenses can be found in the corresponding folder.
The remainder of the code is published under an MIT license.
If you find any bugs, have questions or have other comments, please contact [email protected]