TY - GEN
T1 - Exploring Filterbank Learning for Keyword Spotting
AU - Espejo, Iván López
AU - Tan, Zheng-Hua
AU - Jensen, Jesper
PY - 2021
Y1 - 2021
N2 - Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS.
AB - Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice when using modern KWS back-ends, we also hypothesize that this could be a symptom of information redundancy, which opens up new research possibilities in the field of small-footprint KWS.
KW - End-to-end
KW - Filterbank learning
KW - Gammachirp filterbank
KW - Gammatone filterbank
KW - Keyword spotting
UR - https://www.eurasip.org/Proceedings/Eusipco/Eusipco2020/HTML/session-index.html#1010
UR - http://www.scopus.com/inward/record.url?scp=85099274946&partnerID=8YFLogxK
U2 - 10.23919/Eusipco47968.2020.9287772
DO - 10.23919/Eusipco47968.2020.9287772
M3 - Article in proceeding
SN - 978-1-7281-5001-7
T3 - Proceedings of the European Signal Processing Conference
SP - 331
EP - 335
BT - 28th European Signal Processing Conference (EUSIPCO)
PB - IEEE
T2 - 2020 28th European Signal Processing Conference (EUSIPCO)
Y2 - 18 January 2021 through 21 January 2021
ER -