An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting

Ivan Lopez Espejo, Zheng-Hua Tan, Jesper Jensen

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

48 Downloads (Pure)


Keyword spotting (KWS) is, in many instances, intended to run on smart electronic devices characterized by limited computational resources. To meet computational constraints, a series of techniques —ranging from feature and acoustic model parameter quantization to the reduction of the number of model
parameters and required multiplications— has been explored in the literature. With this same aim, in this paper, we study a straightforward alternative consisting of the reduction of the spectro/cepstro-temporal resolution of log-Mel and Melfrequency cepstral coefficient feature matrices commonly employed in KWS. We show that the feature matrix size has a strong impact on the number of multiplications/energy consumption of a state-of-the-art KWS acoustic model based on convolutional neural network. Experimental results demonstrate that the number of elements in commonly used speech feature matrices can be reduced by a factor of 8 while essentially maintaining KWS performance. Even more interestingly, this size reduction leads to a 9.6× number of multiplications/energy consumption, 4.0× training time and 3.7× inference time reduction.
Original languageEnglish
Title of host publicationIberSPEECH 2022
Publication date2022
Publication statusPublished - 2022
EventIberSPEECH 2022 - Granada, Spain
Duration: 14 Nov 202216 Nov 2022


ConferenceIberSPEECH 2022


Dive into the research topics of 'An Experimental Study on Light Speech Features for Small-Footprint Keyword Spotting'. Together they form a unique fingerprint.

Cite this