HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Tomi Kinnunen; Alexey Sholokhov; Elie Khoury; Dennis Alexander Lehmann Thomsen; Md Sahidullah; Zheng-Hua Tan

doi:10.21437/Interspeech.2016-1281

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md Sahidullah, Zheng-Hua Tan

Institut for Elektroniske Systemer

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

6 Citationer (Scopus)

Abstract

Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-
SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to
represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

Originalsprog	Engelsk
Titel	Interspeech 2016 : September 8–12, 2016, San Francisco, USA
Antal sider	5
Forlag	ISCA
Publikationsdato	sep. 2016
Sider	2992-2996
DOI	https://doi.org/10.21437/Interspeech.2016-1281
Status	Udgivet - sep. 2016
Begivenhed	Interspeech 2016 - San Francisco, CA, USA Varighed: 8 sep. 2016 → 12 sep. 2016 http://www.interspeech2016.org/

Konference

Konference	Interspeech 2016
Land/Område	USA
By	San Francisco, CA
Periode	08/09/2016 → 12/09/2016
Internetadresse	http://www.interspeech2016.org/

Adgang til dokumentet

10.21437/Interspeech.2016-1281

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@inproceedings{5bbe2ece49964c4c8f1ad6c760a0359b,

title = "HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors",

abstract = "Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors torepresent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.",

keywords = "NIST OpenSAD, speech activity detection",

author = "Tomi Kinnunen and Alexey Sholokhov and Elie Khoury and Thomsen, {Dennis Alexander Lehmann} and Md Sahidullah and Zheng-Hua Tan",

year = "2016",

month = sep,

doi = "10.21437/Interspeech.2016-1281",

language = "English",

pages = "2992--2996",

booktitle = "Interspeech 2016",

publisher = "ISCA",

note = "Interspeech 2016 ; Conference date: 08-09-2016 Through 12-09-2016",

url = "http://www.interspeech2016.org/",

}

Kinnunen, T, Sholokhov, A, Khoury, E, Thomsen, DAL, Sahidullah, M & Tan, Z-H 2016, HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. i Interspeech 2016: September 8–12, 2016, San Francisco, USA. ISCA, s. 2992-2996, Interspeech 2016, San Francisco, CA, USA, 08/09/2016. https://doi.org/10.21437/Interspeech.2016-1281

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. / Kinnunen, Tomi; Sholokhov, Alexey; Khoury, Elie et al.
Interspeech 2016: September 8–12, 2016, San Francisco, USA. ISCA, 2016. s. 2992-2996.

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - HAPPY Team Entry to NIST OpenSAD Challenge

T2 - Interspeech 2016

AU - Kinnunen, Tomi

AU - Sholokhov, Alexey

AU - Khoury, Elie

AU - Thomsen, Dennis Alexander Lehmann

AU - Sahidullah, Md

AU - Tan, Zheng-Hua

PY - 2016/9

Y1 - 2016/9

N2 - Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors torepresent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

AB - Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors torepresent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

KW - NIST OpenSAD

KW - speech activity detection

U2 - 10.21437/Interspeech.2016-1281

DO - 10.21437/Interspeech.2016-1281

M3 - Article in proceeding

SP - 2992

EP - 2996

BT - Interspeech 2016

PB - ISCA

Y2 - 8 September 2016 through 12 September 2016

ER -

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Abstract

Konference

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater