CountingDINO 🧮🦕

A Training-free Pipeline for Exemplar-based Class-Agnostic Counting

WACV 2026

1ISTI CNR 2University of Pisa
* Equal contribution

Abstract

Class-agnostic counting (CAC) aims to estimate the number of objects in images without being restricted to predefined categories. However, while current exemplar-based CAC methods offer flexibility at inference time, they still rely heavily on labeled data for training, which limits scalability and generalization to many downstream use cases.

In this paper, we introduce CountingDINO, the first training-free exemplar-based CAC framework that exploits a fully unsupervised feature extractor. Specifically, our approach employs self-supervised vision-only backbones to extract object-aware features, and it eliminates the need for annotated data throughout the entire proposed pipeline.

At inference time, we extract latent object prototypes via ROI-Align from DINO features and use them as convolutional kernels to generate similarity maps. These are then transformed into density maps through a simple yet effective normalization scheme.

We evaluate our approach on the FSC-147 benchmark, where we outperform a baseline under the same label-free setting. Our method also achieves competitive — and in some cases superior — results compared to training-free approaches relying on supervised backbones, as well as several fully supervised state-of-the-art methods. This demonstrates that training-free CAC can be both scalable and competitive.

Quantitative Results

inference examples

SOTA comparison on FSC-147 (val/test). Methods are grouped as unsupervised or training-free. Best per category is underlined; best training-free method with an unsupervised backbone is in bold; † marks results of methods in advantaged (supervised or non-training-free) categories performing worse than CountingDINO.

Qualitative Results

inference examples

BibTeX

@inproceedings{pacini2026countingdino,
  title={Countingdino: A training-free pipeline for class-agnostic counting using unsupervised backbones},
  author={Pacini, Giacomo and Bianchi, Lorenzo and Ciampi, Luca and Messina, Nicola and Amato, Giuseppe and Falchi, Fabrizio},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={806--815},
  year={2026}
}

Acknowledgements

FAIR Project Logo This work has received financial support by the project FAIR – Future Artificial Intelligence Research - Spoke 1 (PNRR M4C2 Inv. 1.3 PE00000013) funded by the European Union - Next Generation EU.