Effectiveness of Single-Channel BLSTM Enhancement for Language Identification

Peter Sibbern Frederiksen, Jesus Villalba, Shinji Watanabe, Zheng-Hua Tan, Najim Dehak

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

280 Downloads (Pure)

Abstract

This paper proposes to apply deep neural network (DNN)-based single-channel speech enhancement (SE) to language identification. The 2017 language recognition evaluation (LRE17) introduced noisy audios from videos, in addition to the telephone conversation from past challenges. Because of that, adapting models from telephone speech to noisy speech from the video domain was required to obtain optimum performance. However, such adaptation requires knowledge of the audio domain and availability of in-domain data. Instead of adaptation, we propose to use a speech enhancement step to clean up the noisy audio as preprocessing for language identification. We used a bi-directional long short-term memory (BLSTM) neural network, which given log-Mel noisy features predicts a spectral mask indicating how clean each time-frequency bin is. The noisy spectrogram is multiplied by this predicted mask to obtain the enhanced magnitude spectrogram, and it is transformed back into the time domain by using the unaltered noisy speech phase. The experiments show significant improvement to language identification of noisy speech, for systems with and without domain adaptation, while preserving the identification performance in the telephone audio domain. In the best adapted state-of-the-art bottleneck i-vector system the relative improvement is 11.3% for noisy speech.

OriginalsprogEngelsk
TitelInterspeech 2018
Antal sider5
Vol/bind2018-September
ForlagISCA
Publikationsdatosep. 2018
Sider1823-1827
DOI
StatusUdgivet - sep. 2018
BegivenhedInterspeech 2018 - Hyderabad, Indien
Varighed: 2 sep. 20186 dec. 2018
https://www.isca-speech.org/archive/Interspeech_2018/index.html

Konference

KonferenceInterspeech 2018
Land/OmrådeIndien
ByHyderabad
Periode02/09/201806/12/2018
Internetadresse
NavnProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN2308-457X

Fingeraftryk

Dyk ned i forskningsemnerne om 'Effectiveness of Single-Channel BLSTM Enhancement for Language Identification'. Sammen danner de et unikt fingeraftryk.

Citationsformater