New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Pejman Mowlaee; Mads Græsbøll Christensen; Søren Holdt Jensen

doi:10.1109/TASL.2010.2089520

New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Pejman Mowlaee, Mads Græsbøll Christensen, Søren Holdt Jensen

Research output: Contribution to journal › Journal article › Research › peer-review

29 Citations (Scopus)

763 Downloads (Pure)

Abstract

We present new results on single-channel speech
separation and suggest a new separation approach to improve
the speech quality of separated signals from an observed mix-
ture. The key idea is to derive a mixture estimator based
on sinusoidal parameters. The proposed estimator is aimed at
ﬁnding sinusoidal parameters in the form of codevectors from
vector quantization (VQ) codebooks pre-trained for speakers
that, when combined, best ﬁt the observed mixed signal. The
selected codevectors are then used to reconstruct the recovered
signals for the speakers in the mixture. Compared to the log-
max mixture estimator used in binary masks and the Wiener
ﬁltering approach, it is observed that the proposed method
achieves an acceptable perceptual speech quality with less cross-
talk at different signal-to-signal ratios. Moreover, the method is
independent of pitch estimates and reduces the computational
complexity of the separation by replacing the short-time Fourier
transform (STFT) feature vectors of high dimensionality with
sinusoidal feature vectors. We report separation results for the
proposed method and compare them with respect to other
benchmark methods. The improvements made by applying the
proposed method over other methods are conﬁrmed by employing
perceptual evaluation of speech quality (PESQ) as an objective
measure and a MUSHRA listening test as a subjective evaluation
for both speaker-dependent and gender-dependent scenarios.

Original language	English
Journal	I E E E Transactions on Audio, Speech and Language Processing
Volume	19
Issue number	5
Pages (from-to)	1265-1277
Number of pages	13
ISSN	1558-7916
DOIs	https://doi.org/10.1109/TASL.2010.2089520
Publication status	Published - 2011

Keywords

Mixture estimation
single-channel speech sep- aration
mask methods
speaker codebook
sinusoidal modeling

Access to Document

10.1109/TASL.2010.2089520

Asl2011pmbAccepted author manuscript, 848 KB

http://ieeexplore.ieee.org.zorac.aub.aau.dk/search/srchabstract.jsp?tp=&arnumber=5608497&queryText=New+Results+on+Single-Channel+Speech+Separation+Using+Sinusoidal+Modeling&openedRefinements=*&searchField=Search+All&tag=1

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{c3929d66a3d746d2ab350427dfa93863,

title = "New Results on Single-Channel Speech Separation Using Sinusoidal Modeling",

abstract = "We present new results on single-channel speechseparation and suggest a new separation approach to improvethe speech quality of separated signals from an observed mix-ture. The key idea is to derive a mixture estimator basedon sinusoidal parameters. The proposed estimator is aimed atﬁnding sinusoidal parameters in the form of codevectors fromvector quantization (VQ) codebooks pre-trained for speakersthat, when combined, best ﬁt the observed mixed signal. Theselected codevectors are then used to reconstruct the recoveredsignals for the speakers in the mixture. Compared to the log-max mixture estimator used in binary masks and the Wienerﬁltering approach, it is observed that the proposed methodachieves an acceptable perceptual speech quality with less cross-talk at different signal-to-signal ratios. Moreover, the method isindependent of pitch estimates and reduces the computationalcomplexity of the separation by replacing the short-time Fouriertransform (STFT) feature vectors of high dimensionality withsinusoidal feature vectors. We report separation results for theproposed method and compare them with respect to otherbenchmark methods. The improvements made by applying theproposed method over other methods are conﬁrmed by employingperceptual evaluation of speech quality (PESQ) as an objectivemeasure and a MUSHRA listening test as a subjective evaluationfor both speaker-dependent and gender-dependent scenarios.",

keywords = "Mixture estimation, single-channel speech sep- aration, mask methods, speaker codebook, sinusoidal modeling",

author = "Pejman Mowlaee and Christensen, {Mads Gr{\ae}sb{\o}ll} and Jensen, {S{\o}ren Holdt}",

year = "2011",

doi = "10.1109/TASL.2010.2089520",

language = "English",

volume = "19",

pages = "1265--1277",

journal = "I E E E Transactions on Audio, Speech and Language Processing",

issn = "1558-7916",

publisher = "IEEE Signal Processing Society",

number = "5",

}

TY - JOUR

T1 - New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

AU - Mowlaee, Pejman

AU - Christensen, Mads Græsbøll

AU - Jensen, Søren Holdt

PY - 2011

Y1 - 2011

N2 - We present new results on single-channel speechseparation and suggest a new separation approach to improvethe speech quality of separated signals from an observed mix-ture. The key idea is to derive a mixture estimator basedon sinusoidal parameters. The proposed estimator is aimed atﬁnding sinusoidal parameters in the form of codevectors fromvector quantization (VQ) codebooks pre-trained for speakersthat, when combined, best ﬁt the observed mixed signal. Theselected codevectors are then used to reconstruct the recoveredsignals for the speakers in the mixture. Compared to the log-max mixture estimator used in binary masks and the Wienerﬁltering approach, it is observed that the proposed methodachieves an acceptable perceptual speech quality with less cross-talk at different signal-to-signal ratios. Moreover, the method isindependent of pitch estimates and reduces the computationalcomplexity of the separation by replacing the short-time Fouriertransform (STFT) feature vectors of high dimensionality withsinusoidal feature vectors. We report separation results for theproposed method and compare them with respect to otherbenchmark methods. The improvements made by applying theproposed method over other methods are conﬁrmed by employingperceptual evaluation of speech quality (PESQ) as an objectivemeasure and a MUSHRA listening test as a subjective evaluationfor both speaker-dependent and gender-dependent scenarios.

AB - We present new results on single-channel speechseparation and suggest a new separation approach to improvethe speech quality of separated signals from an observed mix-ture. The key idea is to derive a mixture estimator basedon sinusoidal parameters. The proposed estimator is aimed atﬁnding sinusoidal parameters in the form of codevectors fromvector quantization (VQ) codebooks pre-trained for speakersthat, when combined, best ﬁt the observed mixed signal. Theselected codevectors are then used to reconstruct the recoveredsignals for the speakers in the mixture. Compared to the log-max mixture estimator used in binary masks and the Wienerﬁltering approach, it is observed that the proposed methodachieves an acceptable perceptual speech quality with less cross-talk at different signal-to-signal ratios. Moreover, the method isindependent of pitch estimates and reduces the computationalcomplexity of the separation by replacing the short-time Fouriertransform (STFT) feature vectors of high dimensionality withsinusoidal feature vectors. We report separation results for theproposed method and compare them with respect to otherbenchmark methods. The improvements made by applying theproposed method over other methods are conﬁrmed by employingperceptual evaluation of speech quality (PESQ) as an objectivemeasure and a MUSHRA listening test as a subjective evaluationfor both speaker-dependent and gender-dependent scenarios.

KW - Mixture estimation

KW - single-channel speech sep- aration

KW - mask methods

KW - speaker codebook

KW - sinusoidal modeling

U2 - 10.1109/TASL.2010.2089520

DO - 10.1109/TASL.2010.2089520

M3 - Journal article

SN - 1558-7916

VL - 19

SP - 1265

EP - 1277

JO - I E E E Transactions on Audio, Speech and Language Processing

JF - I E E E Transactions on Audio, Speech and Language Processing

IS - 5

ER -

New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Abstract

Keywords

Access to Document

AUB Link

Fingerprint

Cite this