Deep Spoken Keyword Spotting: An Overview

Ivan Lopez Espejo; Zheng-Hua Tan; John Hansen; Jesper Jensen

doi:10.1109/ACCESS.2021.3139508

Deep Spoken Keyword Spotting: An Overview

Ivan Lopez Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen

Research output: Contribution to journal › Review article › peer-review

44 Citations (Scopus)

206 Downloads (Pure)

Abstract

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

Original language	English
Journal	IEEE Access
Volume	10
Pages (from-to)	4169-4199
Number of pages	31
ISSN	2169-3536
DOIs	https://doi.org/10.1109/ACCESS.2021.3139508
Publication status	Published - Jan 2022

Keywords

Keyword spotting
acoustic model
deep learning
robustness
small footprint

Access to Document

10.1109/ACCESS.2021.3139508Licence: CC BY 4.0

Open Access articleFinal published version, 1.85 MBLicence: CC BY 4.0

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{e86665d7f1af4ed4829352ee1a3465d7,

title = "Deep Spoken Keyword Spotting: An Overview",

abstract = "Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.",

keywords = "Keyword spotting, acoustic model, deep learning, robustness, small footprint",

author = "Espejo, {Ivan Lopez} and Zheng-Hua Tan and John Hansen and Jesper Jensen",

year = "2022",

month = jan,

doi = "10.1109/ACCESS.2021.3139508",

language = "English",

volume = "10",

pages = "4169--4199",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "IEEE",

}

TY - JOUR

T1 - Deep Spoken Keyword Spotting

T2 - An Overview

AU - Espejo, Ivan Lopez

AU - Tan, Zheng-Hua

AU - Hansen, John

AU - Jensen, Jesper

PY - 2022/1

Y1 - 2022/1

N2 - Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

AB - Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

KW - Keyword spotting

KW - acoustic model

KW - deep learning

KW - robustness

KW - small footprint

UR - http://www.scopus.com/inward/record.url?scp=85122597739&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3139508

DO - 10.1109/ACCESS.2021.3139508

M3 - Review article

SN - 2169-3536

VL - 10

SP - 4169

EP - 4199

JO - IEEE Access

JF - IEEE Access

ER -

Deep Spoken Keyword Spotting: An Overview

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this