Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks

Yingke Zhao; Jesper Kjær Nielsen; Mads Græsbøll Christensen; Jingdong Chen

doi:10.23919/EUSIPCO.2018.8553457

Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks

Yingke Zhao, Jesper Kjær Nielsen, Mads Græsbøll Christensen, Jingdong Chen

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

1 Citationer (Scopus)

234 Downloads (Pure)

Abstract

One of the major challenges in wireless acoustic sensor networks (WASN) based speech enhancement is robust and accurate voice activity detection (VAD). VAD is widely used in speech enhancement, speech coding, speech recognition, etc. In speech enhancement applications, VAD plays an important role, since noise statistics can be updated during non-speech frames to ensure efficient noise reduction and tolerable speech distortion. Although significant efforts have been made in single channel VAD, few solutions can be found in the multichannel case, especially in WASN. In this paper, we introduce a distributed VAD by using model-based noise power spectral density (PSD) estimation. For each node in the network, the speech PSD and noise PSD are first estimated, then a distributed detection is made by applying the generalized likelihood ratio test (GLRT). The proposed global GLRT based VAD has a quite general form. Indeed, we can judge whether the speech is present or absent by using the current time frame and frequency band observation or by taking into account the neighbouring frames and bands. Finally, the distributed GLRT result is obtained by using a distributed consensus method, such as random gossip, i.e., the whole detection system does not need any fusion center. With the model-based noise estimation method, the proposed distributed VAD performs robustly under non-stationary noise conditions, such as babble noise. As shown in experiments, the proposed method outperforms traditional multichannel VAD methods in terms of detection accuracy.

Originalsprog	Engelsk
Titel	2018 26th European Signal Processing Conference (EUSIPCO)
Antal sider	5
Forlag	IEEE
Publikationsdato	sep. 2018
Sider	425-429
Artikelnummer	8553457
ISBN (Trykt)	978-90-827970-0-8, 978-1-5386-3736-4
ISBN (Elektronisk)	978-9-0827-9701-5
DOI	https://doi.org/10.23919/EUSIPCO.2018.8553457
Status	Udgivet - sep. 2018
Begivenhed	26th European Signal Processing Conference (EUSIPCO 2018) - Rome, Italien Varighed: 3 sep. 2018 → 7 sep. 2018 Konferencens nummer: 26 http://www.eusipco2018.org

Konference

Konference	26th European Signal Processing Conference (EUSIPCO 2018)
Nummer	26
Land/Område	Italien
By	Rome
Periode	03/09/2018 → 07/09/2018
Internetadresse	http://www.eusipco2018.org

Navn	Proceedings of the European Signal Processing Conference
ISSN	2076-1465

Adgang til dokumentet

10.23919/EUSIPCO.2018.8553457

1570437128Accepteret manuskript, 524 KB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@inproceedings{318a8053391d43a1ad842ab39729b72d,

title = "Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks",

abstract = "One of the major challenges in wireless acoustic sensor networks (WASN) based speech enhancement is robust and accurate voice activity detection (VAD). VAD is widely used in speech enhancement, speech coding, speech recognition, etc. In speech enhancement applications, VAD plays an important role, since noise statistics can be updated during non-speech frames to ensure efficient noise reduction and tolerable speech distortion. Although significant efforts have been made in single channel VAD, few solutions can be found in the multichannel case, especially in WASN. In this paper, we introduce a distributed VAD by using model-based noise power spectral density (PSD) estimation. For each node in the network, the speech PSD and noise PSD are first estimated, then a distributed detection is made by applying the generalized likelihood ratio test (GLRT). The proposed global GLRT based VAD has a quite general form. Indeed, we can judge whether the speech is present or absent by using the current time frame and frequency band observation or by taking into account the neighbouring frames and bands. Finally, the distributed GLRT result is obtained by using a distributed consensus method, such as random gossip, i.e., the whole detection system does not need any fusion center. With the model-based noise estimation method, the proposed distributed VAD performs robustly under non-stationary noise conditions, such as babble noise. As shown in experiments, the proposed method outperforms traditional multichannel VAD methods in terms of detection accuracy.",

author = "Yingke Zhao and Nielsen, {Jesper Kj{\ae}r} and Christensen, {Mads Gr{\ae}sb{\o}ll} and Jingdong Chen",

year = "2018",

month = sep,

doi = "10.23919/EUSIPCO.2018.8553457",

language = "English",

isbn = "978-90-827970-0-8",

series = "Proceedings of the European Signal Processing Conference",

publisher = "IEEE",

pages = "425--429",

booktitle = "2018 26th European Signal Processing Conference (EUSIPCO)",

address = "United States",

note = "26th European Signal Processing Conference (EUSIPCO 2018), EUSIPCO ; Conference date: 03-09-2018 Through 07-09-2018",

url = "http://www.eusipco2018.org",

}

Zhao, Y, Nielsen, JK, Christensen, MG & Chen, J 2018, Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks. i 2018 26th European Signal Processing Conference (EUSIPCO)., 8553457, IEEE, Proceedings of the European Signal Processing Conference, s. 425-429, 26th European Signal Processing Conference (EUSIPCO 2018), Rome, Italien, 03/09/2018. https://doi.org/10.23919/EUSIPCO.2018.8553457

Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks. / Zhao, Yingke; Nielsen, Jesper Kjær; Christensen, Mads Græsbøll et al.
2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2018. s. 425-429 8553457 (Proceedings of the European Signal Processing Conference).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Model-Based Voice Activity Detection in Wireless Acoustic Sensor Networks

AU - Zhao, Yingke

AU - Nielsen, Jesper Kjær

AU - Christensen, Mads Græsbøll

AU - Chen, Jingdong

N1 - Conference code: 26

PY - 2018/9

Y1 - 2018/9

N2 - One of the major challenges in wireless acoustic sensor networks (WASN) based speech enhancement is robust and accurate voice activity detection (VAD). VAD is widely used in speech enhancement, speech coding, speech recognition, etc. In speech enhancement applications, VAD plays an important role, since noise statistics can be updated during non-speech frames to ensure efficient noise reduction and tolerable speech distortion. Although significant efforts have been made in single channel VAD, few solutions can be found in the multichannel case, especially in WASN. In this paper, we introduce a distributed VAD by using model-based noise power spectral density (PSD) estimation. For each node in the network, the speech PSD and noise PSD are first estimated, then a distributed detection is made by applying the generalized likelihood ratio test (GLRT). The proposed global GLRT based VAD has a quite general form. Indeed, we can judge whether the speech is present or absent by using the current time frame and frequency band observation or by taking into account the neighbouring frames and bands. Finally, the distributed GLRT result is obtained by using a distributed consensus method, such as random gossip, i.e., the whole detection system does not need any fusion center. With the model-based noise estimation method, the proposed distributed VAD performs robustly under non-stationary noise conditions, such as babble noise. As shown in experiments, the proposed method outperforms traditional multichannel VAD methods in terms of detection accuracy.

AB - One of the major challenges in wireless acoustic sensor networks (WASN) based speech enhancement is robust and accurate voice activity detection (VAD). VAD is widely used in speech enhancement, speech coding, speech recognition, etc. In speech enhancement applications, VAD plays an important role, since noise statistics can be updated during non-speech frames to ensure efficient noise reduction and tolerable speech distortion. Although significant efforts have been made in single channel VAD, few solutions can be found in the multichannel case, especially in WASN. In this paper, we introduce a distributed VAD by using model-based noise power spectral density (PSD) estimation. For each node in the network, the speech PSD and noise PSD are first estimated, then a distributed detection is made by applying the generalized likelihood ratio test (GLRT). The proposed global GLRT based VAD has a quite general form. Indeed, we can judge whether the speech is present or absent by using the current time frame and frequency band observation or by taking into account the neighbouring frames and bands. Finally, the distributed GLRT result is obtained by using a distributed consensus method, such as random gossip, i.e., the whole detection system does not need any fusion center. With the model-based noise estimation method, the proposed distributed VAD performs robustly under non-stationary noise conditions, such as babble noise. As shown in experiments, the proposed method outperforms traditional multichannel VAD methods in terms of detection accuracy.

U2 - 10.23919/EUSIPCO.2018.8553457

DO - 10.23919/EUSIPCO.2018.8553457

M3 - Article in proceeding

SN - 978-90-827970-0-8

SN - 978-1-5386-3736-4

T3 - Proceedings of the European Signal Processing Conference

SP - 425

EP - 429

BT - 2018 26th European Signal Processing Conference (EUSIPCO)

PB - IEEE

T2 - 26th European Signal Processing Conference (EUSIPCO 2018)

Y2 - 3 September 2018 through 7 September 2018

ER -