Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Ivan Lopez-Espejo; Zheng-Hua Tan; Jesper Jensen

doi:10.21437/Interspeech.2019-2010

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Ivan Lopez-Espejo, Zheng-Hua Tan, Jesper Jensen

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

6 Citationer (Scopus)

39 Downloads (Pure)

Abstract

Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.

Originalsprog	Engelsk
Titel	Interspeech 2019
Antal sider	5
Forlag	ISCA
Publikationsdato	sep. 2019
Sider	3223-3227
DOI	https://doi.org/10.21437/Interspeech.2019-2010
Status	Udgivet - sep. 2019
Begivenhed	Interspeech 2019 - Graz, Østrig Varighed: 15 sep. 2019 → 19 sep. 2019

Konference

Konference	Interspeech 2019
Land/Område	Østrig
By	Graz
Periode	15/09/2019 → 19/09/2019

Navn	Proceedings of the International Conference on Spoken Language Processing
ISSN	1990-9772

Adgang til dokumentet

10.21437/Interspeech.2019-2010

Open Access articleForlagets udgivne version, 396 KB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Citationsformater

@inproceedings{34f6af1b84d448eeb658c8a1cd120272,

title = "Keyword Spotting for Hearing Assistive Devices Robust to External Speakers",

abstract = "Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.",

keywords = "External speaker, Hearing assistive device, Multi-task learning, Robust keyword spotting",

author = "Ivan Lopez-Espejo and Zheng-Hua Tan and Jesper Jensen",

year = "2019",

month = sep,

doi = "10.21437/Interspeech.2019-2010",

language = "English",

series = "Proceedings of the International Conference on Spoken Language Processing",

publisher = "ISCA",

pages = "3223--3227",

booktitle = "Interspeech 2019",

note = "Interspeech 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

}

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. / Lopez-Espejo, Ivan; Tan, Zheng-Hua ; Jensen, Jesper.
Interspeech 2019. ISCA, 2019. s. 3223-3227 (Proceedings of the International Conference on Spoken Language Processing).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

AU - Lopez-Espejo, Ivan

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2019/9

Y1 - 2019/9

N2 - Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.

AB - Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person --user or not-- might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users' own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.

KW - External speaker

KW - Hearing assistive device

KW - Multi-task learning

KW - Robust keyword spotting

UR - https://www.isca-speech.org/archive/Interspeech_2019/index.html

UR - http://www.scopus.com/inward/record.url?scp=85074727330&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-2010

DO - 10.21437/Interspeech.2019-2010

M3 - Article in proceeding

T3 - Proceedings of the International Conference on Spoken Language Processing

SP - 3223

EP - 3227

BT - Interspeech 2019

PB - ISCA

T2 - Interspeech 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater