Projects per year
Description
The BIOSCAN-Insect Dataset is a new large dataset of hand-labelled insect images. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels.
Date made available | 12 Jun 2023 |
---|---|
Publisher | Zenodo |
Emneord
- Computer Vision
- Biodiversity
- Insect biodiversity
- Class-imbalance distribution
- Fine-grained classification
- Taxonomic classification
- DNA barcode sequences
- Barcode Index Number (BIN)
Projects
- 1 Active
-
Pioneer Centre for AI
Tan, Z.-H. (CoPI), Moeslund, T. B. (CoPI) & Larsen, T. (Project Participant)
01/07/2021 → …
Project: Research
Research output
- 1 Article in proceeding
-
A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset
Gharaee, Z., Gong, Z., Pellegrino, N., Zarubiieva, I., Haurum, J. B., Lowe, S. C., McKeown, J. T. A., Ho, C. C. Y., McLeod, J., Wei, Y.-Y. C., Agda, J., Ratnasingham, S., Steinke, D., Chang, A. X., Taylor, G. W. & Fieguth, P., Sept 2023, (Accepted/In press) Advances in Neural Information Processing Systems. Vol. 37.Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review