Joint variable frame rate and length analysis for speech recognition under adverse conditions

Zheng-Hua Tan; Ivan Kraljevski

doi:10.1016/j.compeleceng.2014.09.002

Joint variable frame rate and length analysis for speech recognition under adverse conditions

Zheng-Hua Tan, Ivan Kraljevski

Institut for Elektroniske Systemer

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

8 Citationer (Scopus)

Abstract

This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.

Originalsprog	Engelsk
Tidsskrift	Computers & Electrical Engineering
Vol/bind	40
Udgave nummer	7
Sider (fra-til)	2139-2149
ISSN	0045-7906
DOI	https://doi.org/10.1016/j.compeleceng.2014.09.002
Status	Udgivet - okt. 2014

Adgang til dokumentet

10.1016/j.compeleceng.2014.09.002

http://www.sciencedirect.com/science/article/pii/S0045790614002304

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@article{2dd31c08263d4b31b4bf841727d9968b,

title = "Joint variable frame rate and length analysis for speech recognition under adverse conditions",

abstract = "This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness. ",

keywords = "Frame selection, noise-robust speech recognition, variable frame rate, variable frame length",

author = "Zheng-Hua Tan and Ivan Kraljevski",

year = "2014",

month = oct,

doi = "10.1016/j.compeleceng.2014.09.002",

language = "English",

volume = "40",

pages = "2139--2149",

journal = "Computers & Electrical Engineering",

issn = "0045-7906",

publisher = "Pergamon Press",

number = "7",

}

TY - JOUR

T1 - Joint variable frame rate and length analysis for speech recognition under adverse conditions

AU - Tan, Zheng-Hua

AU - Kraljevski, Ivan

PY - 2014/10

Y1 - 2014/10

N2 - This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.

AB - This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.

KW - Frame selection

KW - noise-robust speech recognition

KW - variable frame rate

KW - variable frame length

U2 - 10.1016/j.compeleceng.2014.09.002

DO - 10.1016/j.compeleceng.2014.09.002

M3 - Journal article

SN - 0045-7906

VL - 40

SP - 2139

EP - 2149

JO - Computers & Electrical Engineering

JF - Computers & Electrical Engineering

IS - 7

ER -

Joint variable frame rate and length analysis for speech recognition under adverse conditions

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater