A MAP Criterion for Detecting the Number of Speakers at frame level in Model-based Single-Channel Speech Separation

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review

7 Citationer (Scopus)
171 Downloads (Pure)

Resumé

The problem of detecting the number of speakers for a particular segment occurs in many dif-
ferent speech applications. In single channel speech separation, for example, this information is
often used to simplify the separation process, as the signal has to be treated differently depending
on the number of speakers. Inspired by the asymptotic maximum a posteriori rule proposed for
model selection, we pose the problem as a model selection problem. More specifically, we derive
a multiple hypotheses test for determining the number of speakers at a frame level in an observed
signal based on underlying parametric speaker models, trained a priori. The experimental results
indicate that the suggested method improves the quality of the separated signals in a single-channel
speech separation scenario at different signal-to-signal ratio levels.
OriginalsprogEngelsk
TidsskriftAsilomar Conference on Signals, Systems and Computers. Conference Record
Sider (fra-til)538 - 541
ISSN1058-6393
DOI
StatusUdgivet - 2010
Begivenhed44th Asilomar Conference on Signals, Systems and Computers - Pacific Grove, USA
Varighed: 7 nov. 201010 nov. 2010

Konference

Konference44th Asilomar Conference on Signals, Systems and Computers
LandUSA
ByPacific Grove
Periode07/11/201010/11/2010

Citer dette

@inproceedings{a3ad4a12ce8349fc940df6e8e7dd68a3,
title = "A MAP Criterion for Detecting the Number of Speakers at frame level in Model-based Single-Channel Speech Separation",
abstract = "The problem of detecting the number of speakers for a particular segment occurs in many dif-ferent speech applications. In single channel speech separation, for example, this information isoften used to simplify the separation process, as the signal has to be treated differently dependingon the number of speakers. Inspired by the asymptotic maximum a posteriori rule proposed formodel selection, we pose the problem as a model selection problem. More specifically, we derivea multiple hypotheses test for determining the number of speakers at a frame level in an observedsignal based on underlying parametric speaker models, trained a priori. The experimental resultsindicate that the suggested method improves the quality of the separated signals in a single-channelspeech separation scenario at different signal-to-signal ratio levels.",
author = "Pejman Mowlaee and Christensen, {Mads Gr{\ae}sb{\o}ll} and Zheng-Hua Tan and Jensen, {S{\o}ren Holdt}",
year = "2010",
doi = "10.1109/ACSSC.2010.5757617",
language = "English",
pages = "538 -- 541",
journal = "Asilomar Conference on Signals, Systems and Computers. Conference Record",
issn = "1058-6393",
publisher = "I E E E Computer Society",

}

TY - GEN

T1 - A MAP Criterion for Detecting the Number of Speakers at frame level in Model-based Single-Channel Speech Separation

AU - Mowlaee, Pejman

AU - Christensen, Mads Græsbøll

AU - Tan, Zheng-Hua

AU - Jensen, Søren Holdt

PY - 2010

Y1 - 2010

N2 - The problem of detecting the number of speakers for a particular segment occurs in many dif-ferent speech applications. In single channel speech separation, for example, this information isoften used to simplify the separation process, as the signal has to be treated differently dependingon the number of speakers. Inspired by the asymptotic maximum a posteriori rule proposed formodel selection, we pose the problem as a model selection problem. More specifically, we derivea multiple hypotheses test for determining the number of speakers at a frame level in an observedsignal based on underlying parametric speaker models, trained a priori. The experimental resultsindicate that the suggested method improves the quality of the separated signals in a single-channelspeech separation scenario at different signal-to-signal ratio levels.

AB - The problem of detecting the number of speakers for a particular segment occurs in many dif-ferent speech applications. In single channel speech separation, for example, this information isoften used to simplify the separation process, as the signal has to be treated differently dependingon the number of speakers. Inspired by the asymptotic maximum a posteriori rule proposed formodel selection, we pose the problem as a model selection problem. More specifically, we derivea multiple hypotheses test for determining the number of speakers at a frame level in an observedsignal based on underlying parametric speaker models, trained a priori. The experimental resultsindicate that the suggested method improves the quality of the separated signals in a single-channelspeech separation scenario at different signal-to-signal ratio levels.

U2 - 10.1109/ACSSC.2010.5757617

DO - 10.1109/ACSSC.2010.5757617

M3 - Conference article in Journal

SP - 538

EP - 541

JO - Asilomar Conference on Signals, Systems and Computers. Conference Record

JF - Asilomar Conference on Signals, Systems and Computers. Conference Record

SN - 1058-6393

ER -