Voice activity detection using audio-visual information

Theodore Petsatodis, Aristodemos Pnevmatikakis, Christos Boukis

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

13 Citationer (Scopus)

Abstract

An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.
OriginalsprogEngelsk
TitelDSP 2009: 16th International Conference on Digital Signal Processing, Proceedings
Antal sider5
ForlagIEEE Press
Publikationsdato2009
Sider1-5
ISBN (Trykt)978-142443298-1, 978-1-4244-3297-4
DOI
StatusUdgivet - 2009
BegivenhedDSP 2009: 16th International Conference on Digital Signal Processing - Santorini, Grækenland
Varighed: 5 jul. 20097 jul. 2009

Konference

KonferenceDSP 2009: 16th International Conference on Digital Signal Processing
Land/OmrådeGrækenland
BySantorini
Periode05/07/200907/07/2009

Bibliografisk note

Article number 5201171

Fingeraftryk

Dyk ned i forskningsemnerne om 'Voice activity detection using audio-visual information'. Sammen danner de et unikt fingeraftryk.

Citationsformater