Speech and Audio Signal Modeling Using Perceptual Unitary ESPRIT

Description

The problem of modeling a signal segment as a sum of sinusoidal components is of interest in a wide range of fields, including speech and audio processing. Traditionally, the basis functions are selected from a dictionary of constant-amplitude, constant-frequency sinusoids, and FFT based estimation techniques are used for extracting the perceptually most relevant components. In this research conducted within the ARDOR project, we propose a subspace based algorithm for representing a signal segment using a linear combination of perceptually relevant constant-amplitude, constant-frequency (cacf) sinusoids. The algorithm can be seen as a combination of the unitary esprit algorithm, a well-known scheme for estimating sinusoids in noise, and a recently developed perceptual distortion measure. In analysis-synthesis experiments with wideband audio signals, the algorithm shows improvements over state-of-the-art techniques for signals which satisfy the underlying signal model well. However, for signals which cannot be approximated well by cacf sinusoids, the proposed scheme does not improve upon state-of-the-art techniques [J. Jensen, R. Heusdens, and S.H. Jensen, 2003a]. (Søren Holdt Jensen, Jesper Jensen (Delft University of Technology), Richard Heusdens (Delft University of Technology))
StatusFinished
Effective start/end date19/05/201031/12/2017