Utilizing Domain Knowledge in End-to-End Audio Processing

Tycho Tax, Jose Luis Diez Antich, Hendrik Purwins, Lars Maaløe

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.
Original languageEnglish
Title of host publicationWorkshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)
Publication dateDec 2017
Publication statusPublished - Dec 2017
EventConference and Workshop on Neural Information Processing Systems (NIPS): Machine Learning for Audio Signal Processing - Long Beach Convention & Entertainment Center, Long Beach, United States
Duration: 8 Dec 20178 Dec 2017
https://nips.cc/Conferences/2017/Schedule?showEvent=8790

Conference

ConferenceConference and Workshop on Neural Information Processing Systems (NIPS)
LocationLong Beach Convention & Entertainment Center
Country/TerritoryUnited States
CityLong Beach
Period08/12/201708/12/2017
Internet address

Keywords

  • Deep Learning
  • convolutional neural networks
  • audio signal processing
  • end-to-end learning

Fingerprint

Dive into the research topics of 'Utilizing Domain Knowledge in End-to-End Audio Processing'. Together they form a unique fingerprint.

Cite this