Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.
Original languageEnglish
Title of host publicationInterspeech 2019
Number of pages5
PublisherISCA
Publication dateSep 2019
Pages3223-3227
DOIs
Publication statusPublished - Sep 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Conference

ConferenceInterspeech 2019
CountryAustria
CityGraz
Period15/09/201919/09/2019
SeriesProceedings of the International Conference on Spoken Language Processing
ISSN1990-9772

Fingerprint

Audition
Hearing aids
Experiments

Cite this

Lopez-Espejo, I., Tan, Z-H., & Jensen, J. (2019). Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. In Interspeech 2019 (pp. 3223-3227). ISCA. Proceedings of the International Conference on Spoken Language Processing https://doi.org/10.21437/Interspeech.2019-2010
Lopez-Espejo, Ivan ; Tan, Zheng-Hua ; Jensen, Jesper. / Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. Interspeech 2019. ISCA, 2019. pp. 3223-3227 (Proceedings of the International Conference on Spoken Language Processing).
@inproceedings{34f6af1b84d448eeb658c8a1cd120272,
title = "Keyword Spotting for Hearing Assistive Devices Robust to External Speakers",
abstract = "Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32{\%} with respect to a system that does not deal with external speakers.",
author = "Ivan Lopez-Espejo and Zheng-Hua Tan and Jesper Jensen",
year = "2019",
month = "9",
doi = "10.21437/Interspeech.2019-2010",
language = "English",
series = "Proceedings of the International Conference on Spoken Language Processing",
publisher = "ISCA",
pages = "3223--3227",
booktitle = "Interspeech 2019",

}

Lopez-Espejo, I, Tan, Z-H & Jensen, J 2019, Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. in Interspeech 2019. ISCA, Proceedings of the International Conference on Spoken Language Processing, pp. 3223-3227, Graz, Austria, 15/09/2019. https://doi.org/10.21437/Interspeech.2019-2010

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. / Lopez-Espejo, Ivan; Tan, Zheng-Hua; Jensen, Jesper.

Interspeech 2019. ISCA, 2019. p. 3223-3227 (Proceedings of the International Conference on Spoken Language Processing).

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

TY - GEN

T1 - Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

AU - Lopez-Espejo, Ivan

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2019/9

Y1 - 2019/9

N2 - Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.

AB - Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.

UR - https://www.isca-speech.org/archive/Interspeech_2019/index.html

U2 - 10.21437/Interspeech.2019-2010

DO - 10.21437/Interspeech.2019-2010

M3 - Article in proceeding

T3 - Proceedings of the International Conference on Spoken Language Processing

SP - 3223

EP - 3227

BT - Interspeech 2019

PB - ISCA

ER -

Lopez-Espejo I, Tan Z-H, Jensen J. Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. In Interspeech 2019. ISCA. 2019. p. 3223-3227. (Proceedings of the International Conference on Spoken Language Processing). https://doi.org/10.21437/Interspeech.2019-2010