Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition

Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg

Research output: Contribution to journalJournal articleResearchpeer-review

17 Citations (Scopus)

Abstract

In this paper the temporal correlation of speech is exploited in front-end feature extraction, client based error recovery and server based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Secondly, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Thirdly, a sub-vector based EC technique is presented where error detection and concealment is conducted at the sub-vector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The sub-vector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance and hidden Markov model state duration comparison.
Original languageEnglish
JournalIEEE Transactions on Audio Speech and Language Processing
Volume15
Issue number4
Pages (from-to)1391-1403
Number of pages13
ISSN1558-7916
Publication statusPublished - 2007

Fingerprint

speech recognition
Speech recognition
bandwidth
Bandwidth
Servers
Error detection
Computer system recovery
Hidden Markov models
recovery
Dynamic programming
Decoding
Feature extraction
dynamic programming
decoding
pattern recognition
coding

Keywords

  • Distributed speech recognition
  • error concealment
  • error recovery
  • low bitrate
  • split vector quantization

Cite this

@article{45b3d8b09c2c11db8ed6000ea68e967b,
title = "Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition",
abstract = "In this paper the temporal correlation of speech is exploited in front-end feature extraction, client based error recovery and server based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Secondly, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Thirdly, a sub-vector based EC technique is presented where error detection and concealment is conducted at the sub-vector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The sub-vector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance and hidden Markov model state duration comparison.",
keywords = "Distributed speech recognition, error concealment, error recovery, low bitrate, split vector quantization",
author = "Zheng-Hua Tan and Paul Dalsgaard and B{\o}rge Lindberg",
year = "2007",
language = "English",
volume = "15",
pages = "1391--1403",
journal = "IEEE/ACM Transactions on Audio, Speech, and Language Processing",
issn = "2329-9290",
publisher = "IEEE Signal Processing Society",
number = "4",

}

Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition. / Tan, Zheng-Hua; Dalsgaard, Paul; Lindberg, Børge.

In: IEEE Transactions on Audio Speech and Language Processing, Vol. 15, No. 4, 2007, p. 1391-1403.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition

AU - Tan, Zheng-Hua

AU - Dalsgaard, Paul

AU - Lindberg, Børge

PY - 2007

Y1 - 2007

N2 - In this paper the temporal correlation of speech is exploited in front-end feature extraction, client based error recovery and server based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Secondly, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Thirdly, a sub-vector based EC technique is presented where error detection and concealment is conducted at the sub-vector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The sub-vector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance and hidden Markov model state duration comparison.

AB - In this paper the temporal correlation of speech is exploited in front-end feature extraction, client based error recovery and server based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Secondly, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Thirdly, a sub-vector based EC technique is presented where error detection and concealment is conducted at the sub-vector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The sub-vector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance and hidden Markov model state duration comparison.

KW - Distributed speech recognition

KW - error concealment

KW - error recovery

KW - low bitrate

KW - split vector quantization

M3 - Journal article

VL - 15

SP - 1391

EP - 1403

JO - IEEE/ACM Transactions on Audio, Speech, and Language Processing

JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing

SN - 2329-9290

IS - 4

ER -