Loudspeaker Beamforming for Voice Driven Applications

This repository contains the MATLAB code corresponding to the submitted paper "Loudspeaker beamforming to enhance speech recognition performance of voice driven applications" (Accepted for ICASSP 2025, link will be added once available.)

Usage

Run example.m. This sets:

the room object: where are the loudspeakers, the listener, etc.
the settings object: this object controls the settings, such as the number of integration points in the quadrature method.
the par_meas object: this object is used for the perceptual measure. See also the corresponding paper by Van de Par et al. and my code

After setting the objects the loudspeaker playback signals are computed by calling the loudspeaker spotformer.

While not part of the loudspeaker spotformer itself, the performance of the loudspeaker spotformer is evaluated by considering:

The audio quality in the neighbourhood of the listener, measured using ViSQOLAudio;
The energy reduction achieved in the neighbourhood of the listener (compared to the reference playback file)
The intelligibility at the output of the microphones. The microphone algorithm is either a single microphone (nearest neighbour), a microphone spotformer or an MPDR/MVDR beamformer.

Separate code for the beamformers can be found here (MPDR/MVDR) and here (microphone spotformer).

Requirements

CVX for MATLAB (I used Version 2.2, build 1148).
- I used MOSEK (Version 9.1.9) as solver, but I expect other solvers to work as well.
MATLABs signal processing toolbox
MATLABs audio toolbox
The room-impulse response generator by E. Habets. A version compiled for my system (Ubuntu) is provided.

The code was tested on Matlab R2024a on Ubuntu 23.10.

Notes

None at the moment :)

Licensing

The examples make use of the room-impulse response generator from E. Habets (MIT license). You might need to compile this for your system. The sound excerpt is taken from the movie 'Sprite Fight' by Blender Studio (Creative Commons Attribution 1.0 License). The numerical integration is performed using the Fast Clenshaw-Curtis quadrature by G. von Winckel, published under a permissive license. The voice commands are obtained from the LibriSpeech ASR corpus and published under a CC-BY-SA 4.0 license. All licenses can be found in the corresponding folder.

The remainder of the code is published under an MIT license.

Contact

If you find any bugs, have questions or have other comments, please contact [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Surround Audio		Surround Audio
Voice Commands		Voice Commands
clenquad		clenquad
dependencies		dependencies
rir_generator		rir_generator
LICENSE		LICENSE
MPDRbeamformer.m		MPDRbeamformer.m
Par_measure.m		Par_measure.m
README.md		README.md
Room.m		Room.m
Settings.m		Settings.m
example.m		example.m
spotformer_loudspeaker.m		spotformer_loudspeaker.m
spotformer_microphone.m		spotformer_microphone.m
spotformer_microphone_MPDR.m		spotformer_microphone_MPDR.m
spotformer_nearest_neighbour.m		spotformer_nearest_neighbour.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loudspeaker Beamforming for Voice Driven Applications

Usage

Requirements

Notes

Licensing

Contact

About

Releases

Packages

Languages

License

D1mme/LoudspeakerBeamformingForVoiceDrivenApplications

Folders and files

Latest commit

History

Repository files navigation

Loudspeaker Beamforming for Voice Driven Applications

Usage

Requirements

Notes

Licensing

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages