Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition

Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg

Research output: Contribution to journalJournal articleResearchpeer-review

17 Citations (Scopus)

Abstract

In this paper the temporal correlation of speech is exploited in front-end feature extraction, client based error recovery and server based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Secondly, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Thirdly, a sub-vector based EC technique is presented where error detection and concealment is conducted at the sub-vector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The sub-vector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance and hidden Markov model state duration comparison.
Original languageEnglish
JournalIEEE Transactions on Audio Speech and Language Processing
Volume15
Issue number4
Pages (from-to)1391-1403
Number of pages13
ISSN1558-7916
Publication statusPublished - 2007

Keywords

  • Distributed speech recognition
  • error concealment
  • error recovery
  • low bitrate
  • split vector quantization

Fingerprint

Dive into the research topics of 'Exploiting temporal correlation of speech for error robust and bandwidth flexible distributed speech recognition'. Together they form a unique fingerprint.

Cite this