Abstract
End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.
Original language | English |
---|---|
Title of host publication | Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17) |
Publication date | Dec 2017 |
Publication status | Published - Dec 2017 |
Event | Conference and Workshop on Neural Information Processing Systems (NIPS): Machine Learning for Audio Signal Processing - Long Beach Convention & Entertainment Center, Long Beach, United States Duration: 8 Dec 2017 → 8 Dec 2017 https://nips.cc/Conferences/2017/Schedule?showEvent=8790 |
Conference
Conference | Conference and Workshop on Neural Information Processing Systems (NIPS) |
---|---|
Location | Long Beach Convention & Entertainment Center |
Country/Territory | United States |
City | Long Beach |
Period | 08/12/2017 → 08/12/2017 |
Internet address |
Keywords
- Deep Learning
- convolutional neural networks
- audio signal processing
- end-to-end learning