An Analysis of the GTZAN Music Genre Dataset

Bob L. Sturm

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

42 Citations (Scopus)
887 Downloads (Pure)

Abstract

Most research in automatic music genre recognition
has used the dataset assembled by Tzanetakis et al. in 2001.
The composition and integrity of this dataset,
however, has never been formally analyzed.
For the first time,
we provide an analysis of its composition,
and create a machine-readable index of artist and song titles,
identifying nearly all excerpts.
We also catalog numerous problems with its integrity,
including replications, mislabelings, and distortion.
Original languageEnglish
Title of host publicationProceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies
Volume2012
PublisherAssociation for Computing Machinery
Publication date2012
Pages7-12
ISBN (Print)978-1-4503-1591-3
DOIs
Publication statusPublished - 2012
EventACM Multimedia 2012: Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies - Nara, Japan
Duration: 29 Oct 20122 Nov 2012

Conference

ConferenceACM Multimedia 2012
CountryJapan
CityNara
Period29/10/201202/11/2012
SeriesACM Multimedia

Fingerprint

Integrity
Music
Artist
Song
Replication

Cite this

Sturm, B. L. (2012). An Analysis of the GTZAN Music Genre Dataset. In Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies (Vol. 2012, pp. 7-12). Association for Computing Machinery. ACM Multimedia https://doi.org/10.1145/2390848.2390851
Sturm, Bob L. / An Analysis of the GTZAN Music Genre Dataset. Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies. Vol. 2012 Association for Computing Machinery, 2012. pp. 7-12 (ACM Multimedia).
@inproceedings{673c312b140843f5a1c612ba4506d156,
title = "An Analysis of the GTZAN Music Genre Dataset",
abstract = "Most research in automatic music genre recognitionhas used the dataset assembled by Tzanetakis et al. in 2001.The composition and integrity of this dataset, however, has never been formally analyzed.For the first time,we provide an analysis of its composition,and create a machine-readable index of artist and song titles,identifying nearly all excerpts.We also catalog numerous problems with its integrity,including replications, mislabelings, and distortion.",
author = "Sturm, {Bob L.}",
year = "2012",
doi = "10.1145/2390848.2390851",
language = "English",
isbn = "978-1-4503-1591-3",
volume = "2012",
pages = "7--12",
booktitle = "Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies",
publisher = "Association for Computing Machinery",
address = "United States",

}

Sturm, BL 2012, An Analysis of the GTZAN Music Genre Dataset. in Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies. vol. 2012, Association for Computing Machinery, ACM Multimedia, pp. 7-12, ACM Multimedia 2012, Nara, Japan, 29/10/2012. https://doi.org/10.1145/2390848.2390851

An Analysis of the GTZAN Music Genre Dataset. / Sturm, Bob L.

Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies. Vol. 2012 Association for Computing Machinery, 2012. p. 7-12.

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

TY - GEN

T1 - An Analysis of the GTZAN Music Genre Dataset

AU - Sturm, Bob L.

PY - 2012

Y1 - 2012

N2 - Most research in automatic music genre recognitionhas used the dataset assembled by Tzanetakis et al. in 2001.The composition and integrity of this dataset, however, has never been formally analyzed.For the first time,we provide an analysis of its composition,and create a machine-readable index of artist and song titles,identifying nearly all excerpts.We also catalog numerous problems with its integrity,including replications, mislabelings, and distortion.

AB - Most research in automatic music genre recognitionhas used the dataset assembled by Tzanetakis et al. in 2001.The composition and integrity of this dataset, however, has never been formally analyzed.For the first time,we provide an analysis of its composition,and create a machine-readable index of artist and song titles,identifying nearly all excerpts.We also catalog numerous problems with its integrity,including replications, mislabelings, and distortion.

UR - http://www.scopus.com/inward/record.url?scp=84870497334&partnerID=8YFLogxK

U2 - 10.1145/2390848.2390851

DO - 10.1145/2390848.2390851

M3 - Article in proceeding

SN - 978-1-4503-1591-3

VL - 2012

SP - 7

EP - 12

BT - Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies

PB - Association for Computing Machinery

ER -

Sturm BL. An Analysis of the GTZAN Music Genre Dataset. In Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies. Vol. 2012. Association for Computing Machinery. 2012. p. 7-12. (ACM Multimedia). https://doi.org/10.1145/2390848.2390851