An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Foundation Models

Scott C. Lowe, Joakim Bruslund Haurum, Sageev Oore, Thomas B. Moeslund, Graham W. Taylor

Research output: Contribution to conference without publisher/journalPaper without publisher/journalResearchpeer-review

Abstract

Can foundation models generalize to new datasets outside their training domain, without any retraining? Our suite of benchmarking experiments use encoders pretrained solely on ImageNet-1k with either supervised or self-supervised training techniques, clustering image datasets that were not seen during training with conventional clustering algorithms. This evaluation allows us to investigate the impact of the pretraining protocol on a model's ability to generalize outside its training domain, and explore what is natively prioritized by the model in its embeddings in a real-world scenario where novel data lacks labels. We find supervised encoders typically offer more utility than SSL encoders within the training domain, and vice-versa far outside of it, however, fine-tuned SSL encoders demonstrate the opposite trend.
Original languageEnglish
Publication date2024
Publication statusPublished - 2024
EventICML 2024 Workshop on Foundation Models in the Wild - Vienna, Austria
Duration: 27 Jul 2024 → …

Workshop

WorkshopICML 2024 Workshop on Foundation Models in the Wild
Country/TerritoryAustria
CityVienna
Period27/07/2024 → …

Fingerprint

Dive into the research topics of 'An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Foundation Models'. Together they form a unique fingerprint.
  • Pioneer Centre for AI

    Tan, Z.-H. (CoPI), Moeslund, T. B. (CoPI) & Larsen, T. (Project Participant)

    01/07/2021 → …

    Project: Research

Cite this