Harmonic beamformers for speech enhancement and dereverberation in the time domain

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

24 Downloads (Pure)

Resumé

This paper presents a framework for parametric broadband beamforming that exploits the frequency-domain sparsity of voiced speech to achieve more noise reduction than traditional nonparametric broadband beamforming without introducing additional distortion. In this framework, the harmonic model is used to parametrize the signal of interest by a single parameter, the fundamental frequency, whereby both speech enhancement and derevereration can be performed. This framework thus exploits both the spatial and temporal properties of speech signals simultaneously and includes both fixed and adaptive beamformers, such as (1) delay-and-sum, (2) null forming, (3) Wiener, (4) minimum variance distortionless response (MVDR), and (5) linearly constrained minimum variance beamformers. Moreover, the framework contains standard broadband beamforming as a special case, whereby the proposed beamformers can also handle unvoiced speech. The reported experimental results demonstrate the capabilities of the proposed framework to perform both speech enhancement and dereverberation simultaneously. The proposed beamformers are evaluated in terms of speech distortion and objective measures for speech quality and speech intelligibility, and are compared to nonparametric broadband beamformers. The results show that the proposed beamformers perform well compared to traditional methods, including a state-of-the-art dereverberation method, particularly in adverse conditions with high amounts of noise and reverberation.

OriginalsprogEngelsk
TidsskriftSpeech Communication
Vol/bind116
Sider (fra-til)1-11
ISSN0167-6393
DOI
StatusE-pub ahead of print - jan. 2020

Fingerprint

Speech Enhancement
Speech enhancement
Time Domain
Harmonic
Broadband
Beamforming
Minimum Variance
Speech Intelligibility
Speech intelligibility
Reverberation
Fundamental Frequency
Speech Signal
Noise Reduction
Noise abatement
Sparsity
Acoustic noise
Frequency Domain
Null
Linearly
Framework

Citer dette

@article{ab66d45237a14532801017d1e08e8640,
title = "Harmonic beamformers for speech enhancement and dereverberation in the time domain",
abstract = "This paper presents a framework for parametric broadband beamforming that exploits the frequency-domain sparsity of voiced speech to achieve more noise reduction than traditional nonparametric broadband beamforming without introducing additional distortion. In this framework, the harmonic model is used to parametrize the signal of interest by a single parameter, the fundamental frequency, whereby both speech enhancement and derevereration can be performed. This framework thus exploits both the spatial and temporal properties of speech signals simultaneously and includes both fixed and adaptive beamformers, such as (1) delay-and-sum, (2) null forming, (3) Wiener, (4) minimum variance distortionless response (MVDR), and (5) linearly constrained minimum variance beamformers. Moreover, the framework contains standard broadband beamforming as a special case, whereby the proposed beamformers can also handle unvoiced speech. The reported experimental results demonstrate the capabilities of the proposed framework to perform both speech enhancement and dereverberation simultaneously. The proposed beamformers are evaluated in terms of speech distortion and objective measures for speech quality and speech intelligibility, and are compared to nonparametric broadband beamformers. The results show that the proposed beamformers perform well compared to traditional methods, including a state-of-the-art dereverberation method, particularly in adverse conditions with high amounts of noise and reverberation.",
author = "Jensen, {Jesper Rindom} and Sam Karimian-Azari and Christensen, {Mads Gr{\ae}sb{\o}ll} and Jacob Benesty",
year = "2020",
month = "1",
doi = "10.1016/j.specom.2019.11.003",
language = "English",
volume = "116",
pages = "1--11",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",

}

Harmonic beamformers for speech enhancement and dereverberation in the time domain. / Jensen, Jesper Rindom; Karimian-Azari, Sam; Christensen, Mads Græsbøll; Benesty, Jacob.

I: Speech Communication, Bind 116, 01.2020, s. 1-11.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Harmonic beamformers for speech enhancement and dereverberation in the time domain

AU - Jensen, Jesper Rindom

AU - Karimian-Azari, Sam

AU - Christensen, Mads Græsbøll

AU - Benesty, Jacob

PY - 2020/1

Y1 - 2020/1

N2 - This paper presents a framework for parametric broadband beamforming that exploits the frequency-domain sparsity of voiced speech to achieve more noise reduction than traditional nonparametric broadband beamforming without introducing additional distortion. In this framework, the harmonic model is used to parametrize the signal of interest by a single parameter, the fundamental frequency, whereby both speech enhancement and derevereration can be performed. This framework thus exploits both the spatial and temporal properties of speech signals simultaneously and includes both fixed and adaptive beamformers, such as (1) delay-and-sum, (2) null forming, (3) Wiener, (4) minimum variance distortionless response (MVDR), and (5) linearly constrained minimum variance beamformers. Moreover, the framework contains standard broadband beamforming as a special case, whereby the proposed beamformers can also handle unvoiced speech. The reported experimental results demonstrate the capabilities of the proposed framework to perform both speech enhancement and dereverberation simultaneously. The proposed beamformers are evaluated in terms of speech distortion and objective measures for speech quality and speech intelligibility, and are compared to nonparametric broadband beamformers. The results show that the proposed beamformers perform well compared to traditional methods, including a state-of-the-art dereverberation method, particularly in adverse conditions with high amounts of noise and reverberation.

AB - This paper presents a framework for parametric broadband beamforming that exploits the frequency-domain sparsity of voiced speech to achieve more noise reduction than traditional nonparametric broadband beamforming without introducing additional distortion. In this framework, the harmonic model is used to parametrize the signal of interest by a single parameter, the fundamental frequency, whereby both speech enhancement and derevereration can be performed. This framework thus exploits both the spatial and temporal properties of speech signals simultaneously and includes both fixed and adaptive beamformers, such as (1) delay-and-sum, (2) null forming, (3) Wiener, (4) minimum variance distortionless response (MVDR), and (5) linearly constrained minimum variance beamformers. Moreover, the framework contains standard broadband beamforming as a special case, whereby the proposed beamformers can also handle unvoiced speech. The reported experimental results demonstrate the capabilities of the proposed framework to perform both speech enhancement and dereverberation simultaneously. The proposed beamformers are evaluated in terms of speech distortion and objective measures for speech quality and speech intelligibility, and are compared to nonparametric broadband beamformers. The results show that the proposed beamformers perform well compared to traditional methods, including a state-of-the-art dereverberation method, particularly in adverse conditions with high amounts of noise and reverberation.

U2 - 10.1016/j.specom.2019.11.003

DO - 10.1016/j.specom.2019.11.003

M3 - Journal article

VL - 116

SP - 1

EP - 11

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -