method: DINOv2, ViT-B, 16x16 patch size, linear decoder2024-08-23

Authors: Tommie Kerssies, Daan de Geus, and Gijs Dubbelman

Affiliation: Eindhoven University of Technology

Email: t.kerssies@tue.nl

Description: Fine-tuning for ~40 epochs on Cityscapes, following the setup described in: "How to Benchmark Vision Foundation Models for Semantic Segmentation?" (https://www.tue-mps.org/benchmark-vfm-ss/)