Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Sidsel Marie Nørholm; Jesper Rindom Jensen; Mads Græsbøll Christensen

doi:10.1109/TASLP.2016.2514492

Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Sidsel Marie Nørholm, Jesper Rindom Jensen, Mads Græsbøll Christensen

Research output: Contribution to journal › Journal article › Research › peer-review

11 Citations (Scopus)

339 Downloads (Pure)

Abstract

In this paper, single channel speech enhancement in
the time domain is considered. We address the problem of modelling
non-stationary speech by describing the voiced speech parts
by a harmonic linear chirp model instead of using the traditional
harmonic model. This means that the speech signal is not assumed
stationary, instead the fundamental frequency can vary linearly
within each frame. The linearly constrained minimum variance
(LCMV) filter and the amplitude and phase estimation (APES) filter
are derived in this framework and compared to the harmonic
versions of the same filters. It is shown through simulations on
synthetic and speech signals, that the chirp versions of the filters
perform better than their harmonic counterparts in terms of output
signal-to-noise ratio (SNR) and signal reduction factor. For
synthetic signals, the output SNR for the harmonic chirp APES
based filter is increased 3 dB compared to the harmonic APES
based filter at an input SNR of 10 dB, and at the same time the signal
reduction factor is decreased. For speech signals, the increase is
1.5 dB along with a decrease in the signal reduction factor of 0.7.
As an implicit part of the APES filter, a noise covariance matrix
estimate is obtained. We suggest using this estimate in combination
with other filters such as the Wiener filter. The performance
of the Wiener filter and LCMV filter are compared using the APES
noise covariance matrix estimate and a power spectral density
(PSD) based noise covariance matrix estimate. It is shown that
the APES covariance matrix works well in combination with the
Wiener filter, and the PSD based covariance matrix works well in
combination with the LCMV filter.

Original language	English
Journal	I E E E Transactions on Audio, Speech and Language Processing
Volume	24
Issue number	4
Pages (from-to)	645-658
ISSN	1558-7916
DOIs	https://doi.org/10.1109/TASLP.2016.2514492
Publication status	Published - Apr 2016

Keywords

chirp model
harmonic signal model
non-stationary speech
speech enhancement

Access to Document

10.1109/TASLP.2016.2514492

chirp_realSubmitted manuscript, 490 KBLicence: Unspecified

AUB Link

Search for the material in Aalborg University Library's search engine

Localization and Tracking of Speech - a Joint Audio-Visual Approach
Jensen, J. R.
01/10/2013 → 30/09/2016
Project: Research
Spatio-Temporal Filtering Methods for Enhancement and Separation of Speech Signals
Christensen, M. G., Nørholm, S. M., Karimian-Azari, S. & Jensen, J. R.
01/08/2012 → 30/06/2015
Project: Research

Cite this

@article{3f7902a9207f4a1bb5274520d83d5d0f,

title = "Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech",

abstract = "In this paper, single channel speech enhancement inthe time domain is considered. We address the problem of modellingnon-stationary speech by describing the voiced speech partsby a harmonic linear chirp model instead of using the traditionalharmonic model. This means that the speech signal is not assumedstationary, instead the fundamental frequency can vary linearlywithin each frame. The linearly constrained minimum variance(LCMV) filter and the amplitude and phase estimation (APES) filterare derived in this framework and compared to the harmonicversions of the same filters. It is shown through simulations onsynthetic and speech signals, that the chirp versions of the filtersperform better than their harmonic counterparts in terms of outputsignal-to-noise ratio (SNR) and signal reduction factor. Forsynthetic signals, the output SNR for the harmonic chirp APESbased filter is increased 3 dB compared to the harmonic APESbased filter at an input SNR of 10 dB, and at the same time the signalreduction factor is decreased. For speech signals, the increase is1.5 dB along with a decrease in the signal reduction factor of 0.7.As an implicit part of the APES filter, a noise covariance matrixestimate is obtained. We suggest using this estimate in combinationwith other filters such as the Wiener filter. The performanceof the Wiener filter and LCMV filter are compared using the APESnoise covariance matrix estimate and a power spectral density(PSD) based noise covariance matrix estimate. It is shown thatthe APES covariance matrix works well in combination with theWiener filter, and the PSD based covariance matrix works well incombination with the LCMV filter.",

keywords = "chirp model, harmonic signal model, non-stationary speech, speech enhancement",

author = "N{\o}rholm, {Sidsel Marie} and Jensen, {Jesper Rindom} and Christensen, {Mads Gr{\ae}sb{\o}ll}",

year = "2016",

month = apr,

doi = "10.1109/TASLP.2016.2514492",

language = "English",

volume = "24",

pages = "645--658",

journal = "I E E E Transactions on Audio, Speech and Language Processing",

issn = "1558-7916",

publisher = "IEEE Signal Processing Society",

number = "4",

}

TY - JOUR

T1 - Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

AU - Nørholm, Sidsel Marie

AU - Jensen, Jesper Rindom

AU - Christensen, Mads Græsbøll

PY - 2016/4

Y1 - 2016/4

N2 - In this paper, single channel speech enhancement inthe time domain is considered. We address the problem of modellingnon-stationary speech by describing the voiced speech partsby a harmonic linear chirp model instead of using the traditionalharmonic model. This means that the speech signal is not assumedstationary, instead the fundamental frequency can vary linearlywithin each frame. The linearly constrained minimum variance(LCMV) filter and the amplitude and phase estimation (APES) filterare derived in this framework and compared to the harmonicversions of the same filters. It is shown through simulations onsynthetic and speech signals, that the chirp versions of the filtersperform better than their harmonic counterparts in terms of outputsignal-to-noise ratio (SNR) and signal reduction factor. Forsynthetic signals, the output SNR for the harmonic chirp APESbased filter is increased 3 dB compared to the harmonic APESbased filter at an input SNR of 10 dB, and at the same time the signalreduction factor is decreased. For speech signals, the increase is1.5 dB along with a decrease in the signal reduction factor of 0.7.As an implicit part of the APES filter, a noise covariance matrixestimate is obtained. We suggest using this estimate in combinationwith other filters such as the Wiener filter. The performanceof the Wiener filter and LCMV filter are compared using the APESnoise covariance matrix estimate and a power spectral density(PSD) based noise covariance matrix estimate. It is shown thatthe APES covariance matrix works well in combination with theWiener filter, and the PSD based covariance matrix works well incombination with the LCMV filter.

AB - In this paper, single channel speech enhancement inthe time domain is considered. We address the problem of modellingnon-stationary speech by describing the voiced speech partsby a harmonic linear chirp model instead of using the traditionalharmonic model. This means that the speech signal is not assumedstationary, instead the fundamental frequency can vary linearlywithin each frame. The linearly constrained minimum variance(LCMV) filter and the amplitude and phase estimation (APES) filterare derived in this framework and compared to the harmonicversions of the same filters. It is shown through simulations onsynthetic and speech signals, that the chirp versions of the filtersperform better than their harmonic counterparts in terms of outputsignal-to-noise ratio (SNR) and signal reduction factor. Forsynthetic signals, the output SNR for the harmonic chirp APESbased filter is increased 3 dB compared to the harmonic APESbased filter at an input SNR of 10 dB, and at the same time the signalreduction factor is decreased. For speech signals, the increase is1.5 dB along with a decrease in the signal reduction factor of 0.7.As an implicit part of the APES filter, a noise covariance matrixestimate is obtained. We suggest using this estimate in combinationwith other filters such as the Wiener filter. The performanceof the Wiener filter and LCMV filter are compared using the APESnoise covariance matrix estimate and a power spectral density(PSD) based noise covariance matrix estimate. It is shown thatthe APES covariance matrix works well in combination with theWiener filter, and the PSD based covariance matrix works well incombination with the LCMV filter.

KW - chirp model

KW - harmonic signal model

KW - non-stationary speech

KW - speech enhancement

U2 - 10.1109/TASLP.2016.2514492

DO - 10.1109/TASLP.2016.2514492

M3 - Journal article

SN - 1558-7916

VL - 24

SP - 645

EP - 658

JO - I E E E Transactions on Audio, Speech and Language Processing

JF - I E E E Transactions on Audio, Speech and Language Processing

IS - 4

ER -

Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Abstract

Keywords

Access to Document

AUB Link

Fingerprint

Projects

Localization and Tracking of Speech - a Joint Audio-Visual Approach

Spatio-Temporal Filtering Methods for Enhancement and Separation of Speech Signals

Cite this