A Two-Stage Neural Network for Speech Signal Reconstruction from Mel Spectrograms

Filippo Villani, Michele Scarpiniti*, Aurelio Uncini

*Kontaktforfatter

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

Abstract

In this work, we propose a neural network approach for speech reconstruction from mel spectrograms, a crucial task in achieving high-quality data after processing speech signals in the time-frequency domain. Specifically, we propose a two-stage deep learning approach based on an overcomplete deep autoencoder (DAE) for the mel filter bank inversion coupled with the deep version of the Griffin-Lim (DeGLI) algorithm for the phase information recovery. After the pre-training of both parts of the architecture, a final fine-tuning on the whole system is performed. Some numerical results, evaluated on the well-known TIMIT dataset, demonstrate the effectiveness of the proposed idea by obtaining a PESQ of 3.996, a STOI equal to 0.994, and a mean opinion score evaluated as 4.15.

OriginalsprogEngelsk
TitelAdvanced Neural Artificial Intelligence : Theories and Applications
RedaktørerAnna Esposito, Marcos Faundez-Zanuy, Francesco C. Morabito, Eros Pasero, Gennaro Cordasco
Antal sider12
ForlagSpringer
Publikationsdatomaj 2025
Sider267-278
ISBN (Trykt)978-981-96-0993-2, 978-981-96-0996-3
ISBN (Elektronisk)978-981-96-0994-9
DOI
StatusUdgivet - maj 2025
Udgivet eksterntJa
Begivenhed30th International Workshops on Neural Network, WIRN 2023 - Vietri sul Mare, Italien
Varighed: 7 jun. 20239 jun. 2023

Konference

Konference30th International Workshops on Neural Network, WIRN 2023
Land/OmrådeItalien
ByVietri sul Mare
Periode07/06/202309/06/2023
SponsorInternational Institute for Advanced Scientific Studies (IIASS)# Department of Psychology, Università della Campania “Luigi Vanvitelli”, IT# Provincia di Salerno# Comune di Vietri sul Mare# International Neural Network Society (INNS)# Università Mediterranea di Reggio Calabria# Società Italiana Reti Neuroniche (SIREN)#
NavnSmart Innovation, Systems and Technologies
Vol/bind428
ISSN2190-3018

Bibliografisk note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

Fingeraftryk

Dyk ned i forskningsemnerne om 'A Two-Stage Neural Network for Speech Signal Reconstruction from Mel Spectrograms'. Sammen danner de et unikt fingeraftryk.

Citationsformater