Abstract
Audio-visual speech enhancement (SE) is the task of reducing the acoustic background noise in a degraded speech signal using both acoustic and visual information. In this work, we study how to incorporate visual information to enhance a speech signal using acoustic beamformers in hearing aids (HAs). Specifically, we first trained a deep learning model to estimate a time-frequency mask from audio-visual data. Then, we apply this mask to estimate the inter-microphone power spectral densities (PSDs) of the clean and the noise signal. Finally, we used the estimated PSDs to build acoustic beamformers. Assuming that a HA user wears an add-on device comprising a camera pointing at the target speaker, we show that our method can be beneficial for HA systems especially at low signal to noise ratios (SNRs).
Original language | English |
---|---|
Title of host publication | ICASSPW 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Publication date | 2023 |
Article number | 10193370 |
ISBN (Electronic) | 9798350302615 |
DOIs | |
Publication status | Published - 2023 |
Event | 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPW 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 |
Conference
Conference | 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPW 2023 |
---|---|
Country/Territory | Greece |
City | Rhodes Island |
Period | 04/06/2023 → 10/06/2023 |
Sponsor | IEEE, IEEE Signal Processing Society |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Audio-visual
- beamforming
- deep learning
- hearing aids