Variable Frame Rate and Length Analysis for Data Compression in Distributed Speech Recognition

Ivan Kraljevski, Zheng-Hua Tan

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

This paper addresses the issue of data compression in distributed speech recognition on the basis of a variable frame rate and length analysis method. The method first conducts frame selection by using a posteriori signal-to-noise ratio weighted energy distance to find the right time resolution at the signal level, and then increases the length of the selected frame according to the number of non-selected preceding frames to find the right time-frequency resolution at the frame level. It produces high frame rate and small frame length in rapidly changing regions and low frame rate and large frame length for steady regions. The method is applied to scalable source coding in distributed speech recognition where the target bitrate is met by adjusting the frame rate. Speech recognition results show that the proposed approach outperforms other compression methods in terms of recognition accuracy for noisy speech while achieving higher compression rates.
Original languageEnglish
Title of host publicationNetwork Infrastructure and Digital Content (IC-NIDC), 2014 4th IEEE International Conference on
Number of pages5
PublisherIEEE Press
Publication dateSept 2014
Pages453-457
ISBN (Print)978-1-4799-4736-2
ISBN (Electronic)978‐1‐4799‐5624‐1, 978-1-4799-4734-8
DOIs
Publication statusPublished - Sept 2014
EventThe 4th IEEE International Conference on Network Infrastructure and Digital Content - Beijing, China
Duration: 19 Sept 201421 Sept 2014

Conference

ConferenceThe 4th IEEE International Conference on Network Infrastructure and Digital Content
Country/TerritoryChina
CityBeijing
Period19/09/201421/09/2014
SeriesIEEE International Conference Network Infrastructure and Digital Content proceedings

Fingerprint

Dive into the research topics of 'Variable Frame Rate and Length Analysis for Data Compression in Distributed Speech Recognition'. Together they form a unique fingerprint.

Cite this