Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization

Mahmoud Fakhry; Piergiorgio Svaizer; Maurizio Omologo

doi:10.1109/TASLP.2017.2695718

Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization

Mahmoud Fakhry, Piergiorgio Svaizer, Maurizio Omologo

Research output: Contribution to journal › Journal article › Research › peer-review

8 Citations (Scopus)

Abstract

In Gaussian model-based multichannel audio source separation, the likelihood of observed mixtures of source signals is parametrized by source spectral variances and by associated spatial covariance matrices. These parameters are estimated by maximizing the likelihood through an expectation-maximization algorithm and used to separate the signals by means of multichannel Wiener filtering. We propose to estimate these parameters by applying nonnegative factorization based on prior information on source variances. In the nonnegative factorization, spectral basis matrices can be defined as the prior information. The matrices can be either extracted or indirectly made available through a redundant library that is trained in advance. In a separate step, applying nonnegative tensor factorization, two algorithms are proposed in order to either extract or detect the basis matrices that best represent the power spectra of the source signals in the observed mixtures. The factorization is achieved by minimizing the β-divergence through multiplicative update rules. The sparsity of factorization can be controlled by tuning the value of β. Experiments show that sparsity, rather than the value assigned to β in the training, is crucial in order to increase the separation performance. The proposed method was evaluated in several mixing conditions. It provides better separation quality with respect to other comparable algorithms.

Original language	English
Journal	I E E E Transactions on Audio, Speech and Language Processing
Volume	25
Issue number	7
Pages (from-to)	1462 - 1476
Number of pages	15
ISSN	1558-7916
DOIs	https://doi.org/10.1109/TASLP.2017.2695718
Publication status	Published - 2017

Access to Document

10.1109/TASLP.2017.2695718

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{063237daa5594d63b47c367b2164e822,

title = "Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization",

abstract = "In Gaussian model-based multichannel audio source separation, the likelihood of observed mixtures of source signals is parametrized by source spectral variances and by associated spatial covariance matrices. These parameters are estimated by maximizing the likelihood through an expectation-maximization algorithm and used to separate the signals by means of multichannel Wiener filtering. We propose to estimate these parameters by applying nonnegative factorization based on prior information on source variances. In the nonnegative factorization, spectral basis matrices can be defined as the prior information. The matrices can be either extracted or indirectly made available through a redundant library that is trained in advance. In a separate step, applying nonnegative tensor factorization, two algorithms are proposed in order to either extract or detect the basis matrices that best represent the power spectra of the source signals in the observed mixtures. The factorization is achieved by minimizing the β-divergence through multiplicative update rules. The sparsity of factorization can be controlled by tuning the value of β. Experiments show that sparsity, rather than the value assigned to β in the training, is crucial in order to increase the separation performance. The proposed method was evaluated in several mixing conditions. It provides better separation quality with respect to other comparable algorithms.",

author = "Mahmoud Fakhry and Piergiorgio Svaizer and Maurizio Omologo",

year = "2017",

doi = "10.1109/TASLP.2017.2695718",

language = "English",

volume = "25",

pages = "1462 -- 1476",

journal = "I E E E Transactions on Audio, Speech and Language Processing",

issn = "1558-7916",

publisher = "IEEE Signal Processing Society",

number = "7",

}

TY - JOUR

T1 - Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization

AU - Fakhry, Mahmoud

AU - Svaizer, Piergiorgio

AU - Omologo, Maurizio

PY - 2017

Y1 - 2017

N2 - In Gaussian model-based multichannel audio source separation, the likelihood of observed mixtures of source signals is parametrized by source spectral variances and by associated spatial covariance matrices. These parameters are estimated by maximizing the likelihood through an expectation-maximization algorithm and used to separate the signals by means of multichannel Wiener filtering. We propose to estimate these parameters by applying nonnegative factorization based on prior information on source variances. In the nonnegative factorization, spectral basis matrices can be defined as the prior information. The matrices can be either extracted or indirectly made available through a redundant library that is trained in advance. In a separate step, applying nonnegative tensor factorization, two algorithms are proposed in order to either extract or detect the basis matrices that best represent the power spectra of the source signals in the observed mixtures. The factorization is achieved by minimizing the β-divergence through multiplicative update rules. The sparsity of factorization can be controlled by tuning the value of β. Experiments show that sparsity, rather than the value assigned to β in the training, is crucial in order to increase the separation performance. The proposed method was evaluated in several mixing conditions. It provides better separation quality with respect to other comparable algorithms.

AB - In Gaussian model-based multichannel audio source separation, the likelihood of observed mixtures of source signals is parametrized by source spectral variances and by associated spatial covariance matrices. These parameters are estimated by maximizing the likelihood through an expectation-maximization algorithm and used to separate the signals by means of multichannel Wiener filtering. We propose to estimate these parameters by applying nonnegative factorization based on prior information on source variances. In the nonnegative factorization, spectral basis matrices can be defined as the prior information. The matrices can be either extracted or indirectly made available through a redundant library that is trained in advance. In a separate step, applying nonnegative tensor factorization, two algorithms are proposed in order to either extract or detect the basis matrices that best represent the power spectra of the source signals in the observed mixtures. The factorization is achieved by minimizing the β-divergence through multiplicative update rules. The sparsity of factorization can be controlled by tuning the value of β. Experiments show that sparsity, rather than the value assigned to β in the training, is crucial in order to increase the separation performance. The proposed method was evaluated in several mixing conditions. It provides better separation quality with respect to other comparable algorithms.

U2 - 10.1109/TASLP.2017.2695718

DO - 10.1109/TASLP.2017.2695718

M3 - Journal article

SN - 1558-7916

VL - 25

SP - 1462

EP - 1476

JO - I E E E Transactions on Audio, Speech and Language Processing

JF - I E E E Transactions on Audio, Speech and Language Processing

IS - 7

ER -

Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization

Abstract

Access to Document

AUB Link

Fingerprint

Cite this