Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

Nicolai Gajhede, Oliver Beck, Hendrik Purwins

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

16 Citations (Scopus)

Abstract

After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instru- ments (bass, snare, hi-hat). We use the Mel scale log magnitudes (MLS) as a representation for the input of the CNN. We compare the classification results of 1) a CNN (3 conv/max-pool layers and 2 fully connected layers) without drop-out and batch normalization vs. three variants, 2) with drop-out, 3) with batch normalization (BN), and 4) with both drop-out and BN. The CNNs with BN yield the best classification results (97% accuracy).
Original languageEnglish
Title of host publicationAudio Mostly'16
Place of PublicationNew York, USA
PublisherAssociation for Computing Machinery
Publication date2016
Pages111-115
ISBN (Print)978-1-4503-4822-5
DOIs
Publication statusPublished - 2016
EventAudio Mostly 2016 - Norrköping, Sweden
Duration: 4 Oct 20166 Oct 2016

Conference

ConferenceAudio Mostly 2016
Country/TerritorySweden
CityNorrköping
Period04/10/201606/10/2016

Fingerprint

Dive into the research topics of 'Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples'. Together they form a unique fingerprint.

Cite this