Abstract
After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instru- ments (bass, snare, hi-hat). We use the Mel scale log magnitudes (MLS) as a representation for the input of the CNN. We compare the classification results of 1) a CNN (3 conv/max-pool layers and 2 fully connected layers) without drop-out and batch normalization vs. three variants, 2) with drop-out, 3) with batch normalization (BN), and 4) with both drop-out and BN. The CNNs with BN yield the best classification results (97% accuracy).
Original language | English |
---|---|
Title of host publication | Audio Mostly'16 |
Place of Publication | New York, USA |
Publisher | Association for Computing Machinery |
Publication date | 2016 |
Pages | 111-115 |
ISBN (Print) | 978-1-4503-4822-5 |
DOIs | |
Publication status | Published - 2016 |
Event | Audio Mostly 2016 - Norrköping, Sweden Duration: 4 Oct 2016 → 6 Oct 2016 |
Conference
Conference | Audio Mostly 2016 |
---|---|
Country/Territory | Sweden |
City | Norrköping |
Period | 04/10/2016 → 06/10/2016 |