Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

2 Citations (Scopus)

Abstract

In this paper, we propose the use of self-supervised pretraining on a large unlabelled data set to improve the performance of a personalized voice activity detection (VAD) model in adverse conditions. We pretrain a long short-term memory (LSTM)-encoder using the autoregressive predictive coding (APC) framework and fine-tune it for personalized VAD. We also propose a denoising variant of APC, with the goal of improving the robustness of personalized VAD. The trained models are systematically evaluated on both clean speech and speech contaminated by various types of noise at different SNR-levels and compared to a purely supervised model. Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.
Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Number of pages5
PublisherIEEE (Institute of Electrical and Electronics Engineers)
Publication date2024
Article number10447653
ISBN (Print)979-8-3503-4486-8
ISBN (Electronic)979-8-3503-4485-1
DOIs
Publication statusPublished - 2024
Event49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of, Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
LocationSeoul, Korea, Republic of
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/202419/04/2024
SponsorThe Institute of Electrical and Electronics Engineers Signal Processing Society
SeriesProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
ISSN1520-6149

Fingerprint

Dive into the research topics of 'Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions'. Together they form a unique fingerprint.

Cite this