Compressed, Real-Time Voice Activity Detection with Open Source Implementation for Small Devices.

Lasse R. Andersen, Lukas J. Jacobsen, David Campos

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

This paper proposes a real-time voice activity detection (VAD) system that utilizes a compressed convolutional neural network (CNN) model. On general-purpose computers, the system is capable of accurately classifying the presence of speech in audio with low latency. Whereas, when implemented on small devices, the system is showing higher latency, which is presumably an indication of high-load computations in the preprocessing steps. The results of the evaluation indicate that the proposed VAD system is an improvement over the existing solutions, in terms of reducing the model size and improving the level of accuracy among different evaluation metrics. Furthermore, the proposed VAD system offers an extension of the applicability by training the CNN model on a different and more diverse data set. Moreover, the proposed architecture is capable of being compressed to approximately one-eleventh of the size, facilitating eventual deployment on small devices. In contrast to existing closed VAD solutions, the entire pipeline of the proposed VAD system is developed in Python and made available as open source, ensuring the verifiability and accessibility of the work.

Original languageEnglish
Title of host publicationiWOAR 2023 : 8th International Workshop on Sensor-based Activity Recognition and Artificial Intelligence, Proceedings
EditorsDenys J.C. Matthies, Marcin Grzegorzek, Arjan Kuijper, Heike Leutheuser
Number of pages10
Publication date21 Sept 2023
Article number1
ISBN (Electronic)979-8-4007-0816-9
DOIs
Publication statusPublished - 21 Sept 2023
EventiWOAR 2023: 8th international Workshop on Sensor-Based Activity Recognition and Artificial Intelligence - Lübeck, Germany
Duration: 21 Sept 202322 Sept 2023

Conference

ConferenceiWOAR 2023
Country/TerritoryGermany
CityLübeck
Period21/09/202322/09/2023

Bibliographical note

DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

Keywords

  • convolutional neural network
  • model compression
  • open source VAD
  • real-time VAD
  • voice activity detection

Fingerprint

Dive into the research topics of 'Compressed, Real-Time Voice Activity Detection with Open Source Implementation for Small Devices.'. Together they form a unique fingerprint.

Cite this