HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Tomi Kinnunen; Alexey Sholokhov; Elie Khoury; Dennis Alexander Lehmann Thomsen; Md Sahidullah; Zheng-Hua Tan

doi:10.21437/Interspeech.2016-1281

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md Sahidullah, Zheng-Hua Tan

Department of Electronic Systems

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

6 Citations (Scopus)

Abstract

Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-
SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to
represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

Original language	English
Title of host publication	Interspeech 2016 : September 8–12, 2016, San Francisco, USA
Number of pages	5
Publisher	ISCA
Publication date	Sept 2016
Pages	2992-2996
DOIs	https://doi.org/10.21437/Interspeech.2016-1281
Publication status	Published - Sept 2016
Event	Interspeech 2016 - San Francisco, CA, United States Duration: 8 Sept 2016 → 12 Sept 2016 http://www.interspeech2016.org/

Conference

Conference	Interspeech 2016
Country/Territory	United States
City	San Francisco, CA
Period	08/09/2016 → 12/09/2016
Internet address	http://www.interspeech2016.org/

Keywords

NIST OpenSAD
speech activity detection

Access to Document

10.21437/Interspeech.2016-1281

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{5bbe2ece49964c4c8f1ad6c760a0359b,

title = "HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors",

abstract = "Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors torepresent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.",

keywords = "NIST OpenSAD, speech activity detection",

author = "Tomi Kinnunen and Alexey Sholokhov and Elie Khoury and Thomsen, {Dennis Alexander Lehmann} and Md Sahidullah and Zheng-Hua Tan",

year = "2016",

month = sep,

doi = "10.21437/Interspeech.2016-1281",

language = "English",

pages = "2992--2996",

booktitle = "Interspeech 2016",

publisher = "ISCA",

note = "Interspeech 2016 ; Conference date: 08-09-2016 Through 12-09-2016",

url = "http://www.interspeech2016.org/",

}

Kinnunen, T, Sholokhov, A, Khoury, E, Thomsen, DAL, Sahidullah, M & Tan, Z-H 2016, HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. in Interspeech 2016: September 8–12, 2016, San Francisco, USA. ISCA, pp. 2992-2996, Interspeech 2016, San Francisco, CA, United States, 08/09/2016. https://doi.org/10.21437/Interspeech.2016-1281

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. / Kinnunen, Tomi; Sholokhov, Alexey; Khoury, Elie et al.
Interspeech 2016: September 8–12, 2016, San Francisco, USA. ISCA, 2016. p. 2992-2996.

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - HAPPY Team Entry to NIST OpenSAD Challenge

T2 - Interspeech 2016

AU - Kinnunen, Tomi

AU - Sholokhov, Alexey

AU - Khoury, Elie

AU - Thomsen, Dennis Alexander Lehmann

AU - Sahidullah, Md

AU - Tan, Zheng-Hua

PY - 2016/9

Y1 - 2016/9

N2 - Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors torepresent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

AB - Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors torepresent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

KW - NIST OpenSAD

KW - speech activity detection

U2 - 10.21437/Interspeech.2016-1281

DO - 10.21437/Interspeech.2016-1281

M3 - Article in proceeding

SP - 2992

EP - 2996

BT - Interspeech 2016

PB - ISCA

Y2 - 8 September 2016 through 12 September 2016

ER -

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Abstract

Conference

Keywords

Access to Document

AUB Link

Fingerprint

Cite this