Abstract
This paper proposes a real-time voice activity detection (VAD) system that utilizes a compressed convolutional neural network (CNN) model. On general-purpose computers, the system is capable of accurately classifying the presence of speech in audio with low latency. Whereas, when implemented on small devices, the system is showing higher latency, which is presumably an indication of high-load computations in the preprocessing steps. The results of the evaluation indicate that the proposed VAD system is an improvement over the existing solutions, in terms of reducing the model size and improving the level of accuracy among different evaluation metrics. Furthermore, the proposed VAD system offers an extension of the applicability by training the CNN model on a different and more diverse data set. Moreover, the proposed architecture is capable of being compressed to approximately one-eleventh of the size, facilitating eventual deployment on small devices. In contrast to existing closed VAD solutions, the entire pipeline of the proposed VAD system is developed in Python and made available as open source, ensuring the verifiability and accessibility of the work.
Original language | English |
---|---|
Title of host publication | iWOAR 2023 : 8th International Workshop on Sensor-based Activity Recognition and Artificial Intelligence, Proceedings |
Editors | Denys J.C. Matthies, Marcin Grzegorzek, Arjan Kuijper, Heike Leutheuser |
Number of pages | 10 |
Publication date | 21 Sept 2023 |
Article number | 1 |
ISBN (Electronic) | 979-8-4007-0816-9 |
DOIs | |
Publication status | Published - 21 Sept 2023 |
Event | iWOAR 2023: 8th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence - Lübeck, Germany Duration: 21 Sept 2023 → 22 Sept 2023 |
Conference
Conference | iWOAR 2023 |
---|---|
Country/Territory | Germany |
City | Lübeck |
Period | 21/09/2023 → 22/09/2023 |
Bibliographical note
DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.Keywords
- convolutional neural network
- model compression
- open source VAD
- real-time VAD
- voice activity detection