A Perceptual Subspace Approach for Modeling of Speec


The problem of modeling a signal segment as a sum of exponentially damped sinusoidal components arises in many different application areas, including speech and audio processing. Often, model parameters are estimated using subspace based techniques which arrange the input signal in a structured matrix and exploit the so-called shift-invariance property related to certain vector spaces of the input matrix. A problem with this class of estimation algorithms, when used for speech and audio processing, is that the perceptual importance of the sinusoidal components is not taken into account. In this work carried out within the ARDOR project, we propose a solution to this problem. In particular, we show how to combine well-known subspace based estimation techniques with a recently developed perceptual distortion measure, in order to obtain an algorithm for extracting perceptually relevant model components. In analysis-synthesis experiments with wideband audio signals, objective and subjective evaluations show that the proposed algorithm improves perceived signal quality considerable over traditional subspace based analysis methods [J. Jensen, R. Heusdens, and S.H. Jensen 2003b, 2004]. (Søren Holdt Jensen, Jesper Jensen (Delft University of Technology), Richard Heusdens (Delft University of Technology))
Effective start/end date19/05/201031/12/2017