TY - GEN
T1 - Keyword Spotting for Hearing Assistive Devices Robust to External Speakers
AU - Lopez-Espejo, Ivan
AU - Tan, Zheng-Hua
AU - Jensen, Jesper
PY - 2019/9
Y1 - 2019/9
N2 - Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.
AB - Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.
KW - External speaker
KW - Hearing assistive device
KW - Multi-task learning
KW - Robust keyword spotting
UR - https://www.isca-speech.org/archive/Interspeech_2019/index.html
UR - http://www.scopus.com/inward/record.url?scp=85074727330&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-2010
DO - 10.21437/Interspeech.2019-2010
M3 - Article in proceeding
T3 - Proceedings of the International Conference on Spoken Language Processing
SP - 3223
EP - 3227
BT - Interspeech 2019
PB - ISCA
T2 - Interspeech 2019
Y2 - 15 September 2019 through 19 September 2019
ER -