Are deep neural networks really learning relevant features?

Corey Kereliuk; Bob L. Sturm; Jan Larsen

Abstract

In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis. This may be attributed to various factors including advancements in training algorithms, computational power, and the potential for DNNs to implicitly learn a set of feature detectors. We have recently re-examined two works \cite{sigtiaimproved}\cite{hamel2010learning} that consider DNNs for the task of music genre recognition (MGR). These papers conclude that frame-level features learned by DNNs offer an improvement over traditional, hand-crafted features such as Mel-frequency cepstrum coefficients (MFCCs). However, these conclusions were drawn based on training/testing using the GTZAN dataset, which is now known to contain several flaws including replicated observations and artists \cite{sturm2012analysis}. We illustrate how considering these flaws dramatically changes the results, which leads one to question the degree to which the learned frame-level features are actually useful for MGR. We make available a reproducible software package allowing other researchers to completely duplicate our figures and results.

Original language	English
Publication date	2015
Publication status	Published - 2015
Event	Digital Music Research Network 9 - Queen Mary University of London, London, United Kingdom Duration: 16 Dec 2014 → 16 Dec 2014

Workshop

Workshop	Digital Music Research Network 9
Location	Queen Mary University of London
Country/Territory	United Kingdom
City	London
Period	16/12/2014 → 16/12/2014

Access to Document

http://c4dm.eecs.qmul.ac.uk/dmrn/events/dmrnp9/

AUB Link

Search for the material in Aalborg University Library's search engine

CoSound
Christensen, M. G., Tan, Z., Jensen, S. H. & Sturm, B. L.
01/01/2012 → 31/12/2015
Project: Research

Cite this

@conference{d96b54366685463cae9dbc5458cad2e1,

title = "Are deep neural networks really learning relevant features?",

abstract = "In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis. This may be attributed to various factors including advancements in training algorithms, computational power, and the potential for DNNs to implicitly learn a set of feature detectors. We have recently re-examined two works \cite{sigtiaimproved}\cite{hamel2010learning} that consider DNNs for the task of music genre recognition (MGR). These papers conclude that frame-level features learned by DNNs offer an improvement over traditional, hand-crafted features such as Mel-frequency cepstrum coefficients (MFCCs). However, these conclusions were drawn based on training/testing using the GTZAN dataset, which is now known to contain several flaws including replicated observations and artists \cite{sturm2012analysis}. We illustrate how considering these flaws dramatically changes the results, which leads one to question the degree to which the learned frame-level features are actually useful for MGR. We make available a reproducible software package allowing other researchers to completely duplicate our figures and results.",

author = "Corey Kereliuk and Sturm, {Bob L.} and Jan Larsen",

year = "2015",

language = "English",

note = "Digital Music Research Network 9 ; Conference date: 16-12-2014 Through 16-12-2014",