- Task 1 - Single real-domain training (Models must be trained exclusively on the published Cityscapes dataset) - Method: DeiT III (IN21K->IN1K), ViT-B, 16x16 patch size, linear decoder
- Method info
method: DeiT III (IN21K->IN1K), ViT-B, 16x16 patch size, linear decoder2024-08-23
Authors: Tommie Kerssies, Daan de Geus, and Gijs Dubbelman
Affiliation: Eindhoven University of Technology
Email: t.kerssies@tue.nl
Description: Fine-tuning for ~40 epochs on Cityscapes, following the setup described in: "How to Benchmark Vision Foundation Models for Semantic Segmentation?" (https://www.tue-mps.org/benchmark-vfm-ss/)