Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

6 Citationer (Scopus)
176 Downloads (Pure)


In this paper, single channel speech enhancement in
the time domain is considered. We address the problem of modelling
non-stationary speech by describing the voiced speech parts
by a harmonic linear chirp model instead of using the traditional
harmonic model. This means that the speech signal is not assumed
stationary, instead the fundamental frequency can vary linearly
within each frame. The linearly constrained minimum variance
(LCMV) filter and the amplitude and phase estimation (APES) filter
are derived in this framework and compared to the harmonic
versions of the same filters. It is shown through simulations on
synthetic and speech signals, that the chirp versions of the filters
perform better than their harmonic counterparts in terms of output
signal-to-noise ratio (SNR) and signal reduction factor. For
synthetic signals, the output SNR for the harmonic chirp APES
based filter is increased 3 dB compared to the harmonic APES
based filter at an input SNR of 10 dB, and at the same time the signal
reduction factor is decreased. For speech signals, the increase is
1.5 dB along with a decrease in the signal reduction factor of 0.7.
As an implicit part of the APES filter, a noise covariance matrix
estimate is obtained. We suggest using this estimate in combination
with other filters such as the Wiener filter. The performance
of the Wiener filter and LCMV filter are compared using the APES
noise covariance matrix estimate and a power spectral density
(PSD) based noise covariance matrix estimate. It is shown that
the APES covariance matrix works well in combination with the
Wiener filter, and the PSD based covariance matrix works well in
combination with the LCMV filter.
TidsskriftI E E E Transactions on Audio, Speech and Language Processing
Udgave nummer4
Sider (fra-til)645-658
StatusUdgivet - apr. 2016

Fingeraftryk Dyk ned i forskningsemnerne om 'Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech'. Sammen danner de et unikt fingeraftryk.