Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Sidsel Marie Nørholm; Jesper Rindom Jensen; Mads Græsbøll Christensen

doi:10.1109/TASLP.2016.2514492

Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Sidsel Marie Nørholm, Jesper Rindom Jensen, Mads Græsbøll Christensen

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

11 Citationer (Scopus)

342 Downloads (Pure)

Abstract

In this paper, single channel speech enhancement in
the time domain is considered. We address the problem of modelling
non-stationary speech by describing the voiced speech parts
by a harmonic linear chirp model instead of using the traditional
harmonic model. This means that the speech signal is not assumed
stationary, instead the fundamental frequency can vary linearly
within each frame. The linearly constrained minimum variance
(LCMV) filter and the amplitude and phase estimation (APES) filter
are derived in this framework and compared to the harmonic
versions of the same filters. It is shown through simulations on
synthetic and speech signals, that the chirp versions of the filters
perform better than their harmonic counterparts in terms of output
signal-to-noise ratio (SNR) and signal reduction factor. For
synthetic signals, the output SNR for the harmonic chirp APES
based filter is increased 3 dB compared to the harmonic APES
based filter at an input SNR of 10 dB, and at the same time the signal
reduction factor is decreased. For speech signals, the increase is
1.5 dB along with a decrease in the signal reduction factor of 0.7.
As an implicit part of the APES filter, a noise covariance matrix
estimate is obtained. We suggest using this estimate in combination
with other filters such as the Wiener filter. The performance
of the Wiener filter and LCMV filter are compared using the APES
noise covariance matrix estimate and a power spectral density
(PSD) based noise covariance matrix estimate. It is shown that
the APES covariance matrix works well in combination with the
Wiener filter, and the PSD based covariance matrix works well in
combination with the LCMV filter.

Originalsprog	Engelsk
Tidsskrift	I E E E Transactions on Audio, Speech and Language Processing
Vol/bind	24
Udgave nummer	4
Sider (fra-til)	645-658
ISSN	1558-7916
DOI	https://doi.org/10.1109/TASLP.2016.2514492
Status	Udgivet - apr. 2016

Adgang til dokumentet

10.1109/TASLP.2016.2514492

chirp_realIndsendt manuskript, 490 KBLicens: Ikke-specificeret

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Localization and Tracking of Speech - a Joint Audio-Visual Approach
Jensen, J. R.
01/10/2013 → 30/09/2016
Projekter: Projekt › Forskning
Spatio-Temporal Filtering Methods for Enhancement and Separation of Speech Signals
Christensen, M. G., Nørholm, S. M., Karimian-Azari, S. & Jensen, J. R.
01/08/2012 → 30/06/2015
Projekter: Projekt › Forskning

Citationsformater

@article{3f7902a9207f4a1bb5274520d83d5d0f,

title = "Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech",

abstract = "In this paper, single channel speech enhancement inthe time domain is considered. We address the problem of modellingnon-stationary speech by describing the voiced speech partsby a harmonic linear chirp model instead of using the traditionalharmonic model. This means that the speech signal is not assumedstationary, instead the fundamental frequency can vary linearlywithin each frame. The linearly constrained minimum variance(LCMV) filter and the amplitude and phase estimation (APES) filterare derived in this framework and compared to the harmonicversions of the same filters. It is shown through simulations onsynthetic and speech signals, that the chirp versions of the filtersperform better than their harmonic counterparts in terms of outputsignal-to-noise ratio (SNR) and signal reduction factor. Forsynthetic signals, the output SNR for the harmonic chirp APESbased filter is increased 3 dB compared to the harmonic APESbased filter at an input SNR of 10 dB, and at the same time the signalreduction factor is decreased. For speech signals, the increase is1.5 dB along with a decrease in the signal reduction factor of 0.7.As an implicit part of the APES filter, a noise covariance matrixestimate is obtained. We suggest using this estimate in combinationwith other filters such as the Wiener filter. The performanceof the Wiener filter and LCMV filter are compared using the APESnoise covariance matrix estimate and a power spectral density(PSD) based noise covariance matrix estimate. It is shown thatthe APES covariance matrix works well in combination with theWiener filter, and the PSD based covariance matrix works well incombination with the LCMV filter.",

keywords = "chirp model, harmonic signal model, non-stationary speech, speech enhancement",

author = "N{\o}rholm, {Sidsel Marie} and Jensen, {Jesper Rindom} and Christensen, {Mads Gr{\ae}sb{\o}ll}",

year = "2016",

month = apr,

doi = "10.1109/TASLP.2016.2514492",

language = "English",

volume = "24",

pages = "645--658",

journal = "I E E E Transactions on Audio, Speech and Language Processing",

issn = "1558-7916",

publisher = "IEEE Signal Processing Society",

number = "4",

}

TY - JOUR

T1 - Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

AU - Nørholm, Sidsel Marie

AU - Jensen, Jesper Rindom

AU - Christensen, Mads Græsbøll

PY - 2016/4

Y1 - 2016/4

N2 - In this paper, single channel speech enhancement inthe time domain is considered. We address the problem of modellingnon-stationary speech by describing the voiced speech partsby a harmonic linear chirp model instead of using the traditionalharmonic model. This means that the speech signal is not assumedstationary, instead the fundamental frequency can vary linearlywithin each frame. The linearly constrained minimum variance(LCMV) filter and the amplitude and phase estimation (APES) filterare derived in this framework and compared to the harmonicversions of the same filters. It is shown through simulations onsynthetic and speech signals, that the chirp versions of the filtersperform better than their harmonic counterparts in terms of outputsignal-to-noise ratio (SNR) and signal reduction factor. Forsynthetic signals, the output SNR for the harmonic chirp APESbased filter is increased 3 dB compared to the harmonic APESbased filter at an input SNR of 10 dB, and at the same time the signalreduction factor is decreased. For speech signals, the increase is1.5 dB along with a decrease in the signal reduction factor of 0.7.As an implicit part of the APES filter, a noise covariance matrixestimate is obtained. We suggest using this estimate in combinationwith other filters such as the Wiener filter. The performanceof the Wiener filter and LCMV filter are compared using the APESnoise covariance matrix estimate and a power spectral density(PSD) based noise covariance matrix estimate. It is shown thatthe APES covariance matrix works well in combination with theWiener filter, and the PSD based covariance matrix works well incombination with the LCMV filter.

AB - In this paper, single channel speech enhancement inthe time domain is considered. We address the problem of modellingnon-stationary speech by describing the voiced speech partsby a harmonic linear chirp model instead of using the traditionalharmonic model. This means that the speech signal is not assumedstationary, instead the fundamental frequency can vary linearlywithin each frame. The linearly constrained minimum variance(LCMV) filter and the amplitude and phase estimation (APES) filterare derived in this framework and compared to the harmonicversions of the same filters. It is shown through simulations onsynthetic and speech signals, that the chirp versions of the filtersperform better than their harmonic counterparts in terms of outputsignal-to-noise ratio (SNR) and signal reduction factor. Forsynthetic signals, the output SNR for the harmonic chirp APESbased filter is increased 3 dB compared to the harmonic APESbased filter at an input SNR of 10 dB, and at the same time the signalreduction factor is decreased. For speech signals, the increase is1.5 dB along with a decrease in the signal reduction factor of 0.7.As an implicit part of the APES filter, a noise covariance matrixestimate is obtained. We suggest using this estimate in combinationwith other filters such as the Wiener filter. The performanceof the Wiener filter and LCMV filter are compared using the APESnoise covariance matrix estimate and a power spectral density(PSD) based noise covariance matrix estimate. It is shown thatthe APES covariance matrix works well in combination with theWiener filter, and the PSD based covariance matrix works well incombination with the LCMV filter.

KW - chirp model

KW - harmonic signal model

KW - non-stationary speech

KW - speech enhancement

U2 - 10.1109/TASLP.2016.2514492

DO - 10.1109/TASLP.2016.2514492

M3 - Journal article

SN - 1558-7916

VL - 24

SP - 645

EP - 658

JO - I E E E Transactions on Audio, Speech and Language Processing

JF - I E E E Transactions on Audio, Speech and Language Processing

IS - 4

ER -

Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Projekter

Localization and Tracking of Speech - a Joint Audio-Visual Approach

Spatio-Temporal Filtering Methods for Enhancement and Separation of Speech Signals

Citationsformater