A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

Iván López Espejo; Zheng-Hua Tan; Jesper Jensen

doi:10.1109/TASLP.2021.3092567

A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

Iván López Espejo, Zheng-Hua Tan, Jesper Jensen

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

13 Citationer (Scopus)

82 Downloads (Pure)

Abstract

The development of keyword spotting (KWS) systems that are accurate in noisy conditions remains a challenge. Towards this goal, in this paper we propose a novel training strategy relying on multi-condition training for noise-robust KWS. By this strategy, we think of the state-of-the-art KWS models as the composition of a keyword embedding extractor and a linear classifier that are successively trained. To train the keyword embedding extractor, we also propose a new (C_{N,2}+1)-pair loss function extending the concept behind related loss functions like triplet and N-pair losses to reach larger inter-class and smaller intra-class variation. Experimental results on a noisy version of the Google Speech Commands Dataset show that our proposal achieves around 12% KWS accuracy relative improvement with respect to standard end-to-end multi-condition training when speech is distorted by unseen noises. This performance improvement is achieved without increasing the computational complexity of the KWS model.

Originalsprog	Engelsk
Artikelnummer	9465680
Tidsskrift	IEEE/ACM Transactions on Audio, Speech, and Language Processing
Vol/bind	29
Sider (fra-til)	2254 - 2266
Antal sider	13
ISSN	2329-9290
DOI	https://doi.org/10.1109/TASLP.2021.3092567
Status	Udgivet - jul. 2021

Adgang til dokumentet

10.1109/TASLP.2021.3092567

Accepted author manuscriptAccepteret manuskript, 3,91 MB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{bf7d0c4112e14c8eb5f090b2d1198b8e,

title = "A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting",

abstract = "The development of keyword spotting (KWS) systems that are accurate in noisy conditions remains a challenge. Towards this goal, in this paper we propose a novel training strategy relying on multi-condition training for noise-robust KWS. By this strategy, we think of the state-of-the-art KWS models as the composition of a keyword embedding extractor and a linear classifier that are successively trained. To train the keyword embedding extractor, we also propose a new (C_{N,2}+1)-pair loss function extending the concept behind related loss functions like triplet and N-pair losses to reach larger inter-class and smaller intra-class variation. Experimental results on a noisy version of the Google Speech Commands Dataset show that our proposal achieves around 12% KWS accuracy relative improvement with respect to standard end-to-end multi-condition training when speech is distorted by unseen noises. This performance improvement is achieved without increasing the computational complexity of the KWS model. ",

keywords = "Keyword spotting, deep metric learning, keyword embedding, loss function, multi-condition training, noise robustness",

author = "Espejo, {Iv{\'a}n L{\'o}pez} and Zheng-Hua Tan and Jesper Jensen",

year = "2021",

month = jul,

doi = "10.1109/TASLP.2021.3092567",

language = "English",

volume = "29",

pages = "2254 -- 2266",

journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",

issn = "2329-9290",

publisher = "IEEE Signal Processing Society",

}

TY - JOUR

T1 - A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

AU - Espejo, Iván López

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2021/7

Y1 - 2021/7

N2 - The development of keyword spotting (KWS) systems that are accurate in noisy conditions remains a challenge. Towards this goal, in this paper we propose a novel training strategy relying on multi-condition training for noise-robust KWS. By this strategy, we think of the state-of-the-art KWS models as the composition of a keyword embedding extractor and a linear classifier that are successively trained. To train the keyword embedding extractor, we also propose a new (C_{N,2}+1)-pair loss function extending the concept behind related loss functions like triplet and N-pair losses to reach larger inter-class and smaller intra-class variation. Experimental results on a noisy version of the Google Speech Commands Dataset show that our proposal achieves around 12% KWS accuracy relative improvement with respect to standard end-to-end multi-condition training when speech is distorted by unseen noises. This performance improvement is achieved without increasing the computational complexity of the KWS model.

AB - The development of keyword spotting (KWS) systems that are accurate in noisy conditions remains a challenge. Towards this goal, in this paper we propose a novel training strategy relying on multi-condition training for noise-robust KWS. By this strategy, we think of the state-of-the-art KWS models as the composition of a keyword embedding extractor and a linear classifier that are successively trained. To train the keyword embedding extractor, we also propose a new (C_{N,2}+1)-pair loss function extending the concept behind related loss functions like triplet and N-pair losses to reach larger inter-class and smaller intra-class variation. Experimental results on a noisy version of the Google Speech Commands Dataset show that our proposal achieves around 12% KWS accuracy relative improvement with respect to standard end-to-end multi-condition training when speech is distorted by unseen noises. This performance improvement is achieved without increasing the computational complexity of the KWS model.

KW - Keyword spotting

KW - deep metric learning

KW - keyword embedding

KW - loss function

KW - multi-condition training

KW - noise robustness

UR - http://www.scopus.com/inward/record.url?scp=85110759492&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2021.3092567

DO - 10.1109/TASLP.2021.3092567

M3 - Journal article

SN - 2329-9290

VL - 29

SP - 2254

EP - 2266

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

M1 - 9465680

ER -

A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater