Compressed, Real-Time Voice Activity Detection with Open Source Implementation for Small Devices.

Lasse R. Andersen, Lukas J. Jacobsen, David Campos

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

Abstract

This paper proposes a real-time voice activity detection (VAD) system that utilizes a compressed convolutional neural network (CNN) model. On general-purpose computers, the system is capable of accurately classifying the presence of speech in audio with low latency. Whereas, when implemented on small devices, the system is showing higher latency, which is presumably an indication of high-load computations in the preprocessing steps. The results of the evaluation indicate that the proposed VAD system is an improvement over the existing solutions, in terms of reducing the model size and improving the level of accuracy among different evaluation metrics. Furthermore, the proposed VAD system offers an extension of the applicability by training the CNN model on a different and more diverse data set. Moreover, the proposed architecture is capable of being compressed to approximately one-eleventh of the size, facilitating eventual deployment on small devices. In contrast to existing closed VAD solutions, the entire pipeline of the proposed VAD system is developed in Python and made available as open source, ensuring the verifiability and accessibility of the work.

OriginalsprogEngelsk
TiteliWOAR 2023 : 8th International Workshop on Sensor-based Activity Recognition and Artificial Intelligence, Proceedings
RedaktørerDenys J.C. Matthies, Marcin Grzegorzek, Arjan Kuijper, Heike Leutheuser
Antal sider10
Publikationsdato21 sep. 2023
Artikelnummer1
ISBN (Elektronisk)979-8-4007-0816-9
DOI
StatusUdgivet - 21 sep. 2023
BegivenhediWOAR 2023: 8th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence - Lübeck, Tyskland
Varighed: 21 sep. 202322 sep. 2023

Konference

KonferenceiWOAR 2023
Land/OmrådeTyskland
ByLübeck
Periode21/09/202322/09/2023

Fingeraftryk

Dyk ned i forskningsemnerne om 'Compressed, Real-Time Voice Activity Detection with Open Source Implementation for Small Devices.'. Sammen danner de et unikt fingeraftryk.

Citationsformater