A benchmark and a multi-stage pipeline for classifying underwater videos at scale

Sten Kirk Larsen, Lasse Enggaard Rasmussen, Dovydas Jasulaitis, Tomer Sagi, Katja Hose, Yoav Lehahn

Research output: Contribution to journalJournal articleResearchpeer-review

65 Downloads (Pure)

Abstract

Standardised benchmarks have been instrumental in driving the recent progress in computer vision. However, most benchmarks are designed for general-purpose tasks, covering multiple different topics and classes but are limited to the needs of specialised tasks. For example, when performing 3D reconstruction of corals, researchers need to record footage of coral with multiple camera angles. Due to the limited availability of such videos in standard datasets, the ability to reconstruct 3D coral models from public videos would alleviate this problem since it would allow researchers to tap into the vast scope of online content. Thus, one could use machine learning to sift through the immense amounts of content and automatically identify suitable videos for 3D reconstruction. In this work, we introduce a new benchmark that uses amateur footage queried from the YouTube-8 M dataset where each video has been manually labelled for undersea, coral, and multiple camera angles. Furthermore, we construct a three-stage pipeline of machine learning models with the purpose of identifying suitable videos for the 3D reconstruction of coral from the public domain. We instantiate the pipeline with state-of-the-art video classification methods and evaluate their performance on the benchmark, identifying their shortcomings and avenues for future research.
Original languageEnglish
Article number2416227
JournalInternational Journal of Image and Data Fusion
Volume16
Issue number1
Pages (from-to)1-20
Number of pages20
ISSN1947-9832
DOIs
Publication statusPublished - 2025

Keywords

  • Benchmark
  • Computer Vision
  • Coral
  • Deep Learning
  • underwater video classification
  • Transformers
  • Underwater
  • underwater object detection

Fingerprint

Dive into the research topics of 'A benchmark and a multi-stage pipeline for classifying underwater videos at scale'. Together they form a unique fingerprint.

Cite this