Voice activity detection using audio-visual information

Theodore Petsatodis; Aristodemos Pnevmatikakis; Christos Boukis

doi:10.1109/ICDSP.2009.5201171

Voice activity detection using audio-visual information

Theodore Petsatodis, Aristodemos Pnevmatikakis, Christos Boukis

Institut for Elektroniske Systemer

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

13 Citationer (Scopus)

Abstract

An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.

Originalsprog	Engelsk
Titel	DSP 2009: 16th International Conference on Digital Signal Processing, Proceedings
Antal sider	5
Forlag	IEEE Press
Publikationsdato	2009
Sider	1-5
ISBN (Trykt)	978-142443298-1, 978-1-4244-3297-4
DOI	https://doi.org/10.1109/ICDSP.2009.5201171
Status	Udgivet - 2009
Begivenhed	DSP 2009: 16th International Conference on Digital Signal Processing - Santorini, Grækenland Varighed: 5 jul. 2009 → 7 jul. 2009

Konference

Konference	DSP 2009: 16th International Conference on Digital Signal Processing
Land/Område	Grækenland
By	Santorini
Periode	05/07/2009 → 07/07/2009

Bibliografisk note

Article number 5201171

Adgang til dokumentet

10.1109/ICDSP.2009.5201171

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@inproceedings{dd3cddde62504a5a83cbdb62031809ea,

title = "Voice activity detection using audio-visual information",

abstract = "An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.",

author = "Theodore Petsatodis and Aristodemos Pnevmatikakis and Christos Boukis",

note = "Article number 5201171; DSP 2009: 16th International Conference on Digital Signal Processing ; Conference date: 05-07-2009 Through 07-07-2009",

year = "2009",

doi = "10.1109/ICDSP.2009.5201171",

language = "English",

isbn = "978-142443298-1",

pages = "1--5",

booktitle = "DSP 2009: 16th International Conference on Digital Signal Processing, Proceedings",

publisher = "IEEE Press",

}

TY - GEN

T1 - Voice activity detection using audio-visual information

AU - Petsatodis, Theodore

AU - Pnevmatikakis, Aristodemos

AU - Boukis, Christos

N1 - Article number 5201171

PY - 2009

Y1 - 2009

N2 - An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.

AB - An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels of additive white Gaussian noise, to obtain a performance superior than that which each unimodal component alone can achieve.

UR - http://www.scopus.com/inward/record.url?scp=70449555522&partnerID=8YFLogxK

U2 - 10.1109/ICDSP.2009.5201171

DO - 10.1109/ICDSP.2009.5201171

M3 - Article in proceeding

AN - SCOPUS:70449555522

SN - 978-142443298-1

SN - 978-1-4244-3297-4

SP - 1

EP - 5

BT - DSP 2009: 16th International Conference on Digital Signal Processing, Proceedings

PB - IEEE Press

T2 - DSP 2009: 16th International Conference on Digital Signal Processing

Y2 - 5 July 2009 through 7 July 2009

ER -

Voice activity detection using audio-visual information

Abstract

Konference

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater