Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

Nicolai Gajhede; Oliver Beck; Hendrik Purwins

doi:10.1145/2986416.2986453

Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

Nicolai Gajhede, Oliver Beck, Hendrik Purwins

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

16 Citations (Scopus)

Abstract

After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instru- ments (bass, snare, hi-hat). We use the Mel scale log magnitudes (MLS) as a representation for the input of the CNN. We compare the classification results of 1) a CNN (3 conv/max-pool layers and 2 fully connected layers) without drop-out and batch normalization vs. three variants, 2) with drop-out, 3) with batch normalization (BN), and 4) with both drop-out and BN. The CNNs with BN yield the best classification results (97% accuracy).

Original language	English
Title of host publication	Audio Mostly'16
Place of Publication	New York, USA
Publisher	Association for Computing Machinery
Publication date	2016
Pages	111-115
ISBN (Print)	978-1-4503-4822-5
DOIs	https://doi.org/10.1145/2986416.2986453
Publication status	Published - 2016
Event	Audio Mostly 2016 - Norrköping, Sweden Duration: 4 Oct 2016 → 6 Oct 2016

Conference

Conference	Audio Mostly 2016
Country/Territory	Sweden
City	Norrköping
Period	04/10/2016 → 06/10/2016

Access to Document

10.1145/2986416.2986453

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{0b280396c7714ceb8e647b514e84fc3d,

title = "Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples",

abstract = "After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instru- ments (bass, snare, hi-hat). We use the Mel scale log magnitudes (MLS) as a representation for the input of the CNN. We compare the classification results of 1) a CNN (3 conv/max-pool layers and 2 fully connected layers) without drop-out and batch normalization vs. three variants, 2) with drop-out, 3) with batch normalization (BN), and 4) with both drop-out and BN. The CNNs with BN yield the best classification results (97% accuracy).",

author = "Nicolai Gajhede and Oliver Beck and Hendrik Purwins",

year = "2016",

doi = "10.1145/2986416.2986453",

language = "English",

isbn = "978-1-4503-4822-5",

pages = "111--115",

booktitle = "Audio Mostly'16",

publisher = "Association for Computing Machinery",

address = "United States",

note = "Audio Mostly 2016 ; Conference date: 04-10-2016 Through 06-10-2016",

}

Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples. / Gajhede, Nicolai; Beck, Oliver; Purwins, Hendrik.
Audio Mostly'16. New York, USA: Association for Computing Machinery, 2016. p. 111-115.

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

AU - Gajhede, Nicolai

AU - Beck, Oliver

AU - Purwins, Hendrik

PY - 2016

Y1 - 2016

N2 - After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instru- ments (bass, snare, hi-hat). We use the Mel scale log magnitudes (MLS) as a representation for the input of the CNN. We compare the classification results of 1) a CNN (3 conv/max-pool layers and 2 fully connected layers) without drop-out and batch normalization vs. three variants, 2) with drop-out, 3) with batch normalization (BN), and 4) with both drop-out and BN. The CNNs with BN yield the best classification results (97% accuracy).

AB - After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instru- ments (bass, snare, hi-hat). We use the Mel scale log magnitudes (MLS) as a representation for the input of the CNN. We compare the classification results of 1) a CNN (3 conv/max-pool layers and 2 fully connected layers) without drop-out and batch normalization vs. three variants, 2) with drop-out, 3) with batch normalization (BN), and 4) with both drop-out and BN. The CNNs with BN yield the best classification results (97% accuracy).

U2 - 10.1145/2986416.2986453

DO - 10.1145/2986416.2986453

M3 - Article in proceeding

SN - 978-1-4503-4822-5

SP - 111

EP - 115

BT - Audio Mostly'16

PB - Association for Computing Machinery

CY - New York, USA

T2 - Audio Mostly 2016

Y2 - 4 October 2016 through 6 October 2016

ER -

Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

Abstract

Conference

Access to Document

AUB Link

Fingerprint

Cite this