Method: DINOv2, ViT-B, 16x16 patch size, linear decoder - Task 1 - Single real-domain training (Models must be trained exclusively on the published Cityscapes dataset) - Autonomous Driving

Task 1 - Single real-domain training (Models must be trained exclusively on the published Cityscapes dataset) - Method: DINOv2, ViT-B, 16x16 patch size, linear decoder
Method info

method: DINOv2, ViT-B, 16x16 patch size, linear decoder2024-08-23

Authors: Tommie Kerssies, Daan de Geus, and Gijs Dubbelman

Affiliation: Eindhoven University of Technology

Description: Fine-tuning for ~40 epochs on Cityscapes, following the setup described in: "How to Benchmark Vision Foundation Models for Semantic Segmentation?" (https://www.tue-mps.org/benchmark-vfm-ss/)

Tommie Kerssies, Daan de Geus, and Gijs Dubbelman. “How to Benchmark Vision Foundation Models for Semantic Segmentation?” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024

Source code