Deep Spoken Keyword Spotting: An Overview

Ivan Lopez Espejo; Zheng-Hua Tan; John Hansen; Jesper Jensen

doi:10.1109/ACCESS.2021.3139508

Deep Spoken Keyword Spotting: An Overview

Ivan Lopez Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen

Publikation: Bidrag til tidsskrift › Review (oversigtsartikel) › peer review

42 Citationer (Scopus)

196 Downloads (Pure)

Abstract

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

Originalsprog	Engelsk
Tidsskrift	IEEE Access
Vol/bind	10
Sider (fra-til)	4169-4199
Antal sider	31
ISSN	2169-3536
DOI	https://doi.org/10.1109/ACCESS.2021.3139508
Status	Udgivet - jan. 2022

Adgang til dokumentet

10.1109/ACCESS.2021.3139508Licens: CC BY 4.0

Open Access articleForlagets udgivne version, 1,85 MBLicens: CC BY 4.0

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{e86665d7f1af4ed4829352ee1a3465d7,

title = "Deep Spoken Keyword Spotting: An Overview",

abstract = "Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.",

keywords = "Keyword spotting, acoustic model, deep learning, robustness, small footprint",

author = "Espejo, {Ivan Lopez} and Zheng-Hua Tan and John Hansen and Jesper Jensen",

year = "2022",

month = jan,

doi = "10.1109/ACCESS.2021.3139508",

language = "English",

volume = "10",

pages = "4169--4199",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "IEEE",

}

TY - JOUR

T1 - Deep Spoken Keyword Spotting

T2 - An Overview

AU - Espejo, Ivan Lopez

AU - Tan, Zheng-Hua

AU - Hansen, John

AU - Jensen, Jesper

PY - 2022/1

Y1 - 2022/1

N2 - Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

AB - Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.

KW - Keyword spotting

KW - acoustic model

KW - deep learning

KW - robustness

KW - small footprint

UR - http://www.scopus.com/inward/record.url?scp=85122597739&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3139508

DO - 10.1109/ACCESS.2021.3139508

M3 - Review article

SN - 2169-3536

VL - 10

SP - 4169

EP - 4199

JO - IEEE Access

JF - IEEE Access

ER -

Deep Spoken Keyword Spotting: An Overview

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater