Joint variable frame rate and length analysis for speech recognition under adverse conditions

Zheng-Hua Tan, Ivan Kraljevski

Research output: Contribution to journalJournal articleResearchpeer-review

4 Citations (Scopus)

Abstract

This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.
Original languageEnglish
JournalComputers & Electrical Engineering
Volume40
Issue number7
Pages (from-to)2139-2149
ISSN0045-7906
DOIs
Publication statusPublished - Oct 2014

Fingerprint

Speech recognition
Signal to noise ratio

Keywords

  • Frame selection
  • noise-robust speech recognition
  • variable frame rate
  • variable frame length

Cite this

@article{2dd31c08263d4b31b4bf841727d9968b,
title = "Joint variable frame rate and length analysis for speech recognition under adverse conditions",
abstract = "This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.",
keywords = "Frame selection, noise-robust speech recognition, variable frame rate, variable frame length",
author = "Zheng-Hua Tan and Ivan Kraljevski",
year = "2014",
month = "10",
doi = "10.1016/j.compeleceng.2014.09.002",
language = "English",
volume = "40",
pages = "2139--2149",
journal = "Computers & Electrical Engineering",
issn = "0045-7906",
publisher = "Pergamon Press",
number = "7",

}

Joint variable frame rate and length analysis for speech recognition under adverse conditions. / Tan, Zheng-Hua; Kraljevski, Ivan.

In: Computers & Electrical Engineering, Vol. 40, No. 7, 10.2014, p. 2139-2149.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Joint variable frame rate and length analysis for speech recognition under adverse conditions

AU - Tan, Zheng-Hua

AU - Kraljevski, Ivan

PY - 2014/10

Y1 - 2014/10

N2 - This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.

AB - This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.

KW - Frame selection

KW - noise-robust speech recognition

KW - variable frame rate

KW - variable frame length

U2 - 10.1016/j.compeleceng.2014.09.002

DO - 10.1016/j.compeleceng.2014.09.002

M3 - Journal article

VL - 40

SP - 2139

EP - 2149

JO - Computers & Electrical Engineering

JF - Computers & Electrical Engineering

SN - 0045-7906

IS - 7

ER -