Utilizing Domain Knowledge in End-to-End Audio Processing

Tycho  Tax; Jose Luis Diez Antich; Hendrik Purwins; Lars Maaløe

Utilizing Domain Knowledge in End-to-End Audio Processing

Tycho Tax, Jose Luis Diez Antich, Hendrik Purwins, Lars Maaløe

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

Abstract

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

Original language	English
Title of host publication	Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)
Publication date	Dec 2017
Publication status	Published - Dec 2017
Event	Conference and Workshop on Neural Information Processing Systems (NIPS): Machine Learning for Audio Signal Processing - Long Beach Convention & Entertainment Center, Long Beach, United States Duration: 8 Dec 2017 → 8 Dec 2017 https://nips.cc/Conferences/2017/Schedule?showEvent=8790

Conference

Conference	Conference and Workshop on Neural Information Processing Systems (NIPS)
Location	Long Beach Convention & Entertainment Center
Country/Territory	United States
City	Long Beach
Period	08/12/2017 → 08/12/2017
Internet address	https://nips.cc/Conferences/2017/Schedule?showEvent=8790

Keywords

Deep Learning
convolutional neural networks
audio signal processing
end-to-end learning

Access to Document

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{2bc469c134694c24928b7b502701b7f4,

title = "Utilizing Domain Knowledge in End-to-End Audio Processing",

abstract = "End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.",

keywords = "Deep Learning, convolutional neural networks, audio signal processing, end-to-end learning",

author = "Tycho Tax and Antich, {Jose Luis Diez} and Hendrik Purwins and Lars Maal{\o}e",

year = "2017",

month = dec,

language = "English",

booktitle = "Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)",

note = "Conference and Workshop on Neural Information Processing Systems (NIPS) : Machine Learning for Audio Signal Processing , ML4Audio@NIPS17 ; Conference date: 08-12-2017 Through 08-12-2017",

url = "https://nips.cc/Conferences/2017/Schedule?showEvent=8790",

}

TY - GEN

T1 - Utilizing Domain Knowledge in End-to-End Audio Processing

AU - Tax, Tycho

AU - Antich, Jose Luis Diez

AU - Purwins, Hendrik

AU - Maaløe, Lars

PY - 2017/12

Y1 - 2017/12

N2 - End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

AB - End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

KW - Deep Learning

KW - convolutional neural networks

KW - audio signal processing

KW - end-to-end learning

M3 - Article in proceeding

BT - Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17)

T2 - Conference and Workshop on Neural Information Processing Systems (NIPS)

Y2 - 8 December 2017 through 8 December 2017

ER -

Utilizing Domain Knowledge in End-to-End Audio Processing

Abstract

Conference

Keywords

Access to Document

AUB Link

Fingerprint

Cite this