Convex Combination of Multiple Statistical Models with Application to VAD

Theodoros Petsatodis; Christos  Boukis; Fotios  Talantzis; Zheng-Hua Tan; Ramjee Prasad

doi:10.1109/TASL.2011.2131131

Convex Combination of Multiple Statistical Models with Application to VAD

Theodoros Petsatodis, Christos Boukis, Fotios Talantzis, Zheng-Hua Tan, Ramjee Prasad

Institut for Elektroniske Systemer

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

25 Citationer (Scopus)

Abstract

This paper proposes a robust Voice Activity Detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.

Originalsprog	Engelsk
Tidsskrift	I E E E Transactions on Audio, Speech and Language Processing
Vol/bind	19
Udgave nummer	8
Sider (fra-til)	2314-2327
ISSN	1558-7916
DOI	https://doi.org/10.1109/TASL.2011.2131131
Status	Udgivet - nov. 2011

Adgang til dokumentet

10.1109/TASL.2011.2131131

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5737769

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@article{7c38625254a7492d8e055d31ba630ddb,

title = "Convex Combination of Multiple Statistical Models with Application to VAD",

abstract = "This paper proposes a robust Voice Activity Detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.",

keywords = "voice activity detection, convex combination, classiﬁcation , statistical models",

author = "Theodoros Petsatodis and Christos Boukis and Fotios Talantzis and Zheng-Hua Tan and Ramjee Prasad",

year = "2011",

month = nov,

doi = "10.1109/TASL.2011.2131131",

language = "English",

volume = "19",

pages = "2314--2327",

journal = "I E E E Transactions on Audio, Speech and Language Processing",

issn = "1558-7916",

publisher = "IEEE Signal Processing Society",

number = "8",

}

TY - JOUR

T1 - Convex Combination of Multiple Statistical Models with Application to VAD

AU - Petsatodis, Theodoros

AU - Boukis, Christos

AU - Talantzis, Fotios

AU - Tan, Zheng-Hua

AU - Prasad, Ramjee

PY - 2011/11

Y1 - 2011/11

N2 - This paper proposes a robust Voice Activity Detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.

AB - This paper proposes a robust Voice Activity Detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions - a Gaussian, a Laplacian, and a two-sided Gamma - to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.

KW - voice activity detection

KW - convex combination

KW - classiﬁcation

KW - statistical models

U2 - 10.1109/TASL.2011.2131131

DO - 10.1109/TASL.2011.2131131

M3 - Journal article

SN - 1558-7916

VL - 19

SP - 2314

EP - 2327

JO - I E E E Transactions on Audio, Speech and Language Processing

JF - I E E E Transactions on Audio, Speech and Language Processing

IS - 8

ER -

Convex Combination of Multiple Statistical Models with Application to VAD

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater