Improving Monaural Speaker Identification by Double-Talk Detection

Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti

Research output: Contribution to journalConference article in JournalResearchpeer-review

2 Citations (Scopus)
177 Downloads (Pure)

Abstract

This paper describes a novel approach to improve monoaural
speaker identification where two speakers are present in a
single-microphone recording. The goal is to identify both of
the underlying speakers in the given mixture. The proposed
approach is composed of a double-talk detector (DTD) as a preprocessor
and speaker identification back-end. We demonstrate
that including the double-talk detector improves the speaker
identification accuracy. Experiments on GRID corpus show that
including the DTD improves average recognition accuracy from
96.53% to 97.43%.
Original languageEnglish
JournalProceedings of the International Conference on Spoken Language Processing
Pages (from-to)1069-1072
ISSN1990-9772
Publication statusPublished - 26 Sep 2010
EventInterspeech 2010 - Makuhari, Japan
Duration: 26 Sep 201030 Sep 2010

Conference

ConferenceInterspeech 2010
CountryJapan
CityMakuhari
Period26/09/201030/09/2010

Fingerprint

Detectors
Microphones
Experiments

Cite this

@inproceedings{9d30e627a4d84026a58386e5c3046d47,
title = "Improving Monaural Speaker Identification by Double-Talk Detection",
abstract = "This paper describes a novel approach to improve monoauralspeaker identification where two speakers are present in asingle-microphone recording. The goal is to identify both ofthe underlying speakers in the given mixture. The proposedapproach is composed of a double-talk detector (DTD) as a preprocessorand speaker identification back-end. We demonstratethat including the double-talk detector improves the speakeridentification accuracy. Experiments on GRID corpus show thatincluding the DTD improves average recognition accuracy from96.53{\%} to 97.43{\%}.",
author = "Rahim Saeidi and Pejman Mowlaee and Tomi Kinnunen and Zheng-Hua Tan and Christensen, {Mads Gr{\ae}sb{\o}ll} and Jensen, {S{\o}ren Holdt} and Pasi Fr{\"a}nti",
year = "2010",
month = "9",
day = "26",
language = "English",
pages = "1069--1072",
journal = "Proceedings of the International Conference on Spoken Language Processing",
issn = "1990-9772",
publisher = "International Speech Communication Association",

}

Improving Monaural Speaker Identification by Double-Talk Detection. / Saeidi, Rahim ; Mowlaee, Pejman; Kinnunen, Tomi ; Tan, Zheng-Hua; Christensen, Mads Græsbøll; Jensen, Søren Holdt; Fränti, Pasi .

In: Proceedings of the International Conference on Spoken Language Processing, 26.09.2010, p. 1069-1072.

Research output: Contribution to journalConference article in JournalResearchpeer-review

TY - GEN

T1 - Improving Monaural Speaker Identification by Double-Talk Detection

AU - Saeidi, Rahim

AU - Mowlaee, Pejman

AU - Kinnunen, Tomi

AU - Tan, Zheng-Hua

AU - Christensen, Mads Græsbøll

AU - Jensen, Søren Holdt

AU - Fränti, Pasi

PY - 2010/9/26

Y1 - 2010/9/26

N2 - This paper describes a novel approach to improve monoauralspeaker identification where two speakers are present in asingle-microphone recording. The goal is to identify both ofthe underlying speakers in the given mixture. The proposedapproach is composed of a double-talk detector (DTD) as a preprocessorand speaker identification back-end. We demonstratethat including the double-talk detector improves the speakeridentification accuracy. Experiments on GRID corpus show thatincluding the DTD improves average recognition accuracy from96.53% to 97.43%.

AB - This paper describes a novel approach to improve monoauralspeaker identification where two speakers are present in asingle-microphone recording. The goal is to identify both ofthe underlying speakers in the given mixture. The proposedapproach is composed of a double-talk detector (DTD) as a preprocessorand speaker identification back-end. We demonstratethat including the double-talk detector improves the speakeridentification accuracy. Experiments on GRID corpus show thatincluding the DTD improves average recognition accuracy from96.53% to 97.43%.

M3 - Conference article in Journal

SP - 1069

EP - 1072

JO - Proceedings of the International Conference on Spoken Language Processing

JF - Proceedings of the International Conference on Spoken Language Processing

SN - 1990-9772

ER -