HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md Sahidullah, Zheng-Hua Tan

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

6 Citations (Scopus)

Abstract

Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-
SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to
represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.
Original languageEnglish
Title of host publicationInterspeech 2016 : September 8–12, 2016, San Francisco, USA
Number of pages5
PublisherISCA
Publication dateSept 2016
Pages2992-2996
DOIs
Publication statusPublished - Sept 2016
EventInterspeech 2016 - San Francisco, CA, United States
Duration: 8 Sept 201612 Sept 2016
http://www.interspeech2016.org/

Conference

ConferenceInterspeech 2016
Country/TerritoryUnited States
CitySan Francisco, CA
Period08/09/201612/09/2016
Internet address

Keywords

  • NIST OpenSAD
  • speech activity detection

Fingerprint

Dive into the research topics of 'HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors'. Together they form a unique fingerprint.

Cite this