Pre-processing of Speech Signals for Robust Parameter Estimation

Alfredo Esquivel Jaramillo

Publikation: Ph.d.-afhandling

112 Downloads (Pure)

Abstract

The topic of this thesis is methods of pre-processing speech signals for robust estimation of model parameters in models of these signals. Here, there is a special focus on the situation where the desired signal is contaminated by colored noise. In order to estimate the speech signal, or its voiced and unvoiced components, from a noisy observation, it is important to have robust estimators that can handle colored and non-stationary noise.

Two important aspects are investigated. The first one is a robust estimation of the speech signal parameters, such as the fundamental frequency, which is required in many contexts. For this purpose, fast estimation methods based on a simple white Gaussian noise (WGN) assumption are often used. To keep using those methods, the noisy signal can be pre-processed using a filter. If the colored noise is modelled as an autoregressive (AR) process, whose parameters are estimated from the noisy signal, it is possible to render the noise component closer to white with a simple pre-processing filter (pre-whitener). This makes it possible to estimate the fundamental frequency using the aforementioned assumption of white Gaussian noise. In non-stationary noise scenarios, it is possible to obtain better estimates of the noise spectral envelope as well as a higher degree of spectral flatness by using an adaptive pre-whitening filter based on supervised noise statistics estimates, than one based on unsupervised noise statistics. A pre-whitening filter also improves the accuracy of a source localization method. The problem of joint estimation of the parameters of the voiced speech and the stochastic signal parts (i.e., unvoiced and additive noise) is solved first by the cascade of a pre-whitening filter and the nonlinear least squares (NLS) fundamental frequency estimator, followed by an iterative estimation of the pre-whitening filter, based on the modelled residual, and a re-estimation of the fundamental frequency. This will further reduce the number of gross errors of fundamental frequnecy estimates and the voicing detection errors.

The second aspect is as follows: after a more accurate estimation of the parameters is obtained, the extraction of individual speech components (i.e., voiced and unvoiced speech) from a noisy speech signal, is investigated through linear filtering based on the statistics of the individual components. A Wiener filtering approach allows for a better recovery of both components when compared to the state-of-the-art decomposition methods, which assume that the additive noise is small and insignificant. Instead of using a fixed segment length for the extraction, we also propose to use time-varying segment lengths that are adapted to the signal. The optimal segmentation is obtained once the parameter estimates of a hybrid speech model have been found for all possible candidate models and segment lengths.
OriginalsprogEngelsk
Vejledere
  • Christensen, Mads Græsbøll, Hovedvejleder
  • Nielsen, Jesper Kjær, Virksomhedsvejleder, Ekstern person
Udgiver
ISBN'er, elektronisk978-87-7210-984-8
DOI
StatusUdgivet - 2021

Bibliografisk note

PhD supervisor:
Professor Mads Græsbøll Christensen, Aalborg University

Assistant PhD supervisor:
Associate Professor Jesper Kjær Nielsen, Siemens Gamesa

Fingeraftryk

Dyk ned i forskningsemnerne om 'Pre-processing of Speech Signals for Robust Parameter Estimation'. Sammen danner de et unikt fingeraftryk.

Citationsformater