BIOSCAN-1M Insect Dataset

  • Zahra Gharaee (Creator)
  • ZeMing Gong (Creator)
  • Nicholas Pellegrino (Creator)
  • Iuliia Zarubiieva (Creator)
  • Joakim Bruslund Haurum (Creator)
  • Scott C. Lowe (Creator)
  • Jaclyn T. A. McKeown (Creator)
  • Chris C. Y. Ho (Creator)
  • Joschka McLeod (Creator)
  • Yi-Yun C. Wei (Creator)
  • Jireh Agda (Creator)
  • Sujeevan Ratnasingham (Creator)
  • Dirk Steinke (Creator)
  • Angel X. Chang (Creator)
  • Graham W. Taylor (Creator)
  • Paul Fieguth (Creator)

Dataset

Description

The BIOSCAN-Insect Dataset is a new large dataset of hand-labelled insect images. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels.
Date made available12 Jun 2023
PublisherZenodo

Emneord

  • Computer Vision
  • Biodiversity
  • Insect biodiversity
  • Class-imbalance distribution
  • Fine-grained classification
  • Taxonomic classification
  • DNA barcode sequences
  • Barcode Index Number (BIN)
  • A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

    Gharaee, Z., Gong, Z., Pellegrino, N., Zarubiieva, I., Haurum, J. B., Lowe, S. C., McKeown, J. T. A., Ho, C. C. Y., McLeod, J., Wei, Y.-Y. C., Agda, J., Ratnasingham, S., Steinke, D., Chang, A. X., Taylor, G. W. & Fieguth, P., Sept 2023, (Accepted/In press) Advances in Neural Information Processing Systems. Vol. 37.

    Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Cite this