Abstract
Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to Open-
SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to
represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.
SAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to
represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.
Originalsprog | Engelsk |
---|---|
Titel | Interspeech 2016 : September 8–12, 2016, San Francisco, USA |
Antal sider | 5 |
Forlag | ISCA |
Publikationsdato | sep. 2016 |
Sider | 2992-2996 |
DOI | |
Status | Udgivet - sep. 2016 |
Begivenhed | Interspeech 2016 - San Francisco, CA, USA Varighed: 8 sep. 2016 → 12 sep. 2016 http://www.interspeech2016.org/ |
Konference
Konference | Interspeech 2016 |
---|---|
Land/Område | USA |
By | San Francisco, CA |
Periode | 08/09/2016 → 12/09/2016 |
Internetadresse |