Gammatone Filter Bank-Deep Neural Network-based Monaural speech enhancement for unseen conditions

Shoba Sivapatham; Asutosh Kar; Mads Græsbøll Christensen

doi:10.1016/j.apacoust.2022.108784

Gammatone Filter Bank-Deep Neural Network-based Monaural speech enhancement for unseen conditions

Shoba Sivapatham, Asutosh Kar^*, Mads Græsbøll Christensen

^*Kontaktforfatter

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

3 Citationer (Scopus)

Abstract

Speech signal enhancement achieves high-level performance in recent years using deep learning techniques. However, the deep learning technique in the speech enhancement algorithm degrades the performance of speech, particularly for unseen noises, unseen speakers and moreover, deep learning models are limited to the small number of speakers. Hence, we propose a Gammatone filterbank (GTFB) – simple deep neural network (SDNN) based speech enhancement algorithm to improve the quality of speech for three different unseen conditions. The use of GTFB gives a finer resolution in low-frequency regions of speech, and the SDNN model extracts a noisy GTFB frame as input and maps it to a clean speech GTFB frame. The experimental results are measured objectively using signal-noise-ratio, perceptual evaluation of speech quality, short time objective intelligibility, and subjectively using mean opinion score. The experimental results are carried out using a variety of training and testing models. The performance results show that the proposed GTFB-SDNN are robust to a variety of test situations and outperform existing methods.

Originalsprog	Engelsk
Artikelnummer	108784
Tidsskrift	Applied Acoustics
Vol/bind	194
ISSN	0003-682X
DOI	https://doi.org/10.1016/j.apacoust.2022.108784
Status	Udgivet - 15 jun. 2022

Bibliografisk note

Publisher Copyright:
© 2022 Elsevier Ltd

Adgang til dokumentet

10.1016/j.apacoust.2022.108784

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{ad7c223327814568a40573827474bf01,

title = "Gammatone Filter Bank-Deep Neural Network-based Monaural speech enhancement for unseen conditions",

abstract = "Speech signal enhancement achieves high-level performance in recent years using deep learning techniques. However, the deep learning technique in the speech enhancement algorithm degrades the performance of speech, particularly for unseen noises, unseen speakers and moreover, deep learning models are limited to the small number of speakers. Hence, we propose a Gammatone filterbank (GTFB) – simple deep neural network (SDNN) based speech enhancement algorithm to improve the quality of speech for three different unseen conditions. The use of GTFB gives a finer resolution in low-frequency regions of speech, and the SDNN model extracts a noisy GTFB frame as input and maps it to a clean speech GTFB frame. The experimental results are measured objectively using signal-noise-ratio, perceptual evaluation of speech quality, short time objective intelligibility, and subjectively using mean opinion score. The experimental results are carried out using a variety of training and testing models. The performance results show that the proposed GTFB-SDNN are robust to a variety of test situations and outperform existing methods.",

keywords = "Deep neural network, Gammatone filterbank, Intelligibility, Quality, Speech enhancement, Subjective measure",

author = "Shoba Sivapatham and Asutosh Kar and Christensen, {Mads Gr{\ae}sb{\o}ll}",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = jun,

day = "15",

doi = "10.1016/j.apacoust.2022.108784",

language = "English",

volume = "194",

journal = "Applied Acoustics",

issn = "0003-682X",

publisher = "Pergamon Press",

}

TY - JOUR

T1 - Gammatone Filter Bank-Deep Neural Network-based Monaural speech enhancement for unseen conditions

AU - Sivapatham, Shoba

AU - Kar, Asutosh

AU - Christensen, Mads Græsbøll

PY - 2022/6/15

Y1 - 2022/6/15

N2 - Speech signal enhancement achieves high-level performance in recent years using deep learning techniques. However, the deep learning technique in the speech enhancement algorithm degrades the performance of speech, particularly for unseen noises, unseen speakers and moreover, deep learning models are limited to the small number of speakers. Hence, we propose a Gammatone filterbank (GTFB) – simple deep neural network (SDNN) based speech enhancement algorithm to improve the quality of speech for three different unseen conditions. The use of GTFB gives a finer resolution in low-frequency regions of speech, and the SDNN model extracts a noisy GTFB frame as input and maps it to a clean speech GTFB frame. The experimental results are measured objectively using signal-noise-ratio, perceptual evaluation of speech quality, short time objective intelligibility, and subjectively using mean opinion score. The experimental results are carried out using a variety of training and testing models. The performance results show that the proposed GTFB-SDNN are robust to a variety of test situations and outperform existing methods.

AB - Speech signal enhancement achieves high-level performance in recent years using deep learning techniques. However, the deep learning technique in the speech enhancement algorithm degrades the performance of speech, particularly for unseen noises, unseen speakers and moreover, deep learning models are limited to the small number of speakers. Hence, we propose a Gammatone filterbank (GTFB) – simple deep neural network (SDNN) based speech enhancement algorithm to improve the quality of speech for three different unseen conditions. The use of GTFB gives a finer resolution in low-frequency regions of speech, and the SDNN model extracts a noisy GTFB frame as input and maps it to a clean speech GTFB frame. The experimental results are measured objectively using signal-noise-ratio, perceptual evaluation of speech quality, short time objective intelligibility, and subjectively using mean opinion score. The experimental results are carried out using a variety of training and testing models. The performance results show that the proposed GTFB-SDNN are robust to a variety of test situations and outperform existing methods.

KW - Deep neural network

KW - Gammatone filterbank

KW - Intelligibility

KW - Quality

KW - Speech enhancement

KW - Subjective measure

UR - http://www.scopus.com/inward/record.url?scp=85129334500&partnerID=8YFLogxK

U2 - 10.1016/j.apacoust.2022.108784

DO - 10.1016/j.apacoust.2022.108784

M3 - Journal article

AN - SCOPUS:85129334500

SN - 0003-682X

VL - 194

JO - Applied Acoustics

JF - Applied Acoustics

M1 - 108784

ER -

Gammatone Filter Bank-Deep Neural Network-based Monaural speech enhancement for unseen conditions

Abstract

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater