A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

Iván López Espejo, Zheng-Hua Tan, Jesper Jensen

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

13 Citationer (Scopus)
82 Downloads (Pure)

Abstract

The development of keyword spotting (KWS) systems that are accurate in noisy conditions remains a challenge. Towards this goal, in this paper we propose a novel training strategy relying on multi-condition training for noise-robust KWS. By this strategy, we think of the state-of-the-art KWS models as the composition of a keyword embedding extractor and a linear classifier that are successively trained. To train the keyword embedding extractor, we also propose a new (C_{N,2}+1)-pair loss function extending the concept behind related loss functions like triplet and N-pair losses to reach larger inter-class and smaller intra-class variation. Experimental results on a noisy version of the Google Speech Commands Dataset show that our proposal achieves around 12% KWS accuracy relative improvement with respect to standard end-to-end multi-condition training when speech is distorted by unseen noises. This performance improvement is achieved without increasing the computational complexity of the KWS model.

OriginalsprogEngelsk
Artikelnummer9465680
TidsskriftIEEE/ACM Transactions on Audio, Speech, and Language Processing
Vol/bind29
Sider (fra-til)2254 - 2266
Antal sider13
ISSN2329-9290
DOI
StatusUdgivet - jul. 2021

Fingeraftryk

Dyk ned i forskningsemnerne om 'A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting'. Sammen danner de et unikt fingeraftryk.

Citationsformater