Exploring Filterbank Learning for Keyword Spotting

Iván López Espejo; Zheng-Hua Tan; Jesper Jensen

doi:10.23919/Eusipco47968.2020.9287772

Exploring Filterbank Learning for Keyword Spotting

Iván López Espejo, Zheng-Hua Tan, Jesper Jensen

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

13 Citationer (Scopus)

Abstract

Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS.

Originalsprog	Engelsk
Titel	28th European Signal Processing Conference (EUSIPCO)
Antal sider	5
Forlag	IEEE
Publikationsdato	2021
Sider	331-335
Artikelnummer	9287772
ISBN (Trykt)	978-1-7281-5001-7
ISBN (Elektronisk)	978-9-0827-9705-3
DOI	https://doi.org/10.23919/Eusipco47968.2020.9287772
Status	Udgivet - 2021
Begivenhed	2020 28th European Signal Processing Conference (EUSIPCO) - Amsterdam, Holland Varighed: 18 jan. 2021 → 21 jan. 2021

Konference

Konference	2020 28th European Signal Processing Conference (EUSIPCO)
Land/Område	Holland
By	Amsterdam
Periode	18/01/2021 → 21/01/2021

Navn	Proceedings of the European Signal Processing Conference
ISSN	2076-1465

Adgang til dokumentet

10.23919/Eusipco47968.2020.9287772

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Citationsformater

@inproceedings{810fe3830bc445c1a46310d5a2cfd200,

title = "Exploring Filterbank Learning for Keyword Spotting",

abstract = "Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS.",

keywords = "End-to-end, Filterbank learning, Gammachirp filterbank, Gammatone filterbank, Keyword spotting",

author = "Espejo, {Iv{\'a}n L{\'o}pez} and Zheng-Hua Tan and Jesper Jensen",

year = "2021",

doi = "10.23919/Eusipco47968.2020.9287772",

language = "English",

isbn = "978-1-7281-5001-7",

series = "Proceedings of the European Signal Processing Conference",

publisher = "IEEE",

pages = "331--335",

booktitle = "28th European Signal Processing Conference (EUSIPCO)",

address = "United States",

note = "2020 28th European Signal Processing Conference (EUSIPCO) ; Conference date: 18-01-2021 Through 21-01-2021",

}

Espejo, IL, Tan, Z-H & Jensen, J 2021, Exploring Filterbank Learning for Keyword Spotting. i 28th European Signal Processing Conference (EUSIPCO)., 9287772, IEEE, Proceedings of the European Signal Processing Conference, s. 331-335, 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, Holland, 18/01/2021. https://doi.org/10.23919/Eusipco47968.2020.9287772

Exploring Filterbank Learning for Keyword Spotting. / Espejo, Iván López; Tan, Zheng-Hua ; Jensen, Jesper.
28th European Signal Processing Conference (EUSIPCO). IEEE, 2021. s. 331-335 9287772 (Proceedings of the European Signal Processing Conference).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Exploring Filterbank Learning for Keyword Spotting

AU - Espejo, Iván López

AU - Tan, Zheng-Hua

AU - Jensen, Jesper

PY - 2021

Y1 - 2021

N2 - Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS.

AB - Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS.

KW - End-to-end

KW - Filterbank learning

KW - Gammachirp filterbank

KW - Gammatone filterbank

KW - Keyword spotting

UR - https://www.eurasip.org/Proceedings/Eusipco/Eusipco2020/HTML/session-index.html#1010

UR - http://www.scopus.com/inward/record.url?scp=85099274946&partnerID=8YFLogxK

U2 - 10.23919/Eusipco47968.2020.9287772

DO - 10.23919/Eusipco47968.2020.9287772

M3 - Article in proceeding

SN - 978-1-7281-5001-7

T3 - Proceedings of the European Signal Processing Conference

SP - 331

EP - 335

BT - 28th European Signal Processing Conference (EUSIPCO)

PB - IEEE

T2 - 2020 28th European Signal Processing Conference (EUSIPCO)

Y2 - 18 January 2021 through 21 January 2021

ER -

Exploring Filterbank Learning for Keyword Spotting

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater