Participation Instructions - Autonomous Driving

General rules

  1. Models participating in each track must be trained using the datasets in the corresponding allowed list.
  2. Generative models are strictly prohibited from being used for synthetic augmentation.
  3. All results must be reproducible. All participants in the competition are required to provide a white paper containing comprehensive technical details alongside their submitted results. Trained models and inference code should be made accessible.

Track 1 - Single-domain training

In this track, models must be trained exclusively on the published Cityscapes dataset. This track evaluates the robustness of models trained with limited supervision and geographical diversity when facing unexpected corruptions observed in real-world scenarios. 

The evaluation is performed on the 19 semantic classes of Cityscapes.

Track 2 - Multi-domain training

In this track, the models can be trained over a mix of multiple datasets, which are strictly limited to the list provided below. This track permits the utilization of a wider range of datasets derived from both real and synthetic domains. It aims to assess the extent to which robustness can be enhanced with fewer constraints on the collection of training data.

The evaluation is performed on the 19 semantic classes of Cityscapes. Participants can either maintain label sets of other datasets or remap them to Cityscapes’. 

List of accepted datasets:

Submissions file format

Submissions are expected as binary files that encode pixel-wise prediction results along with their corresponding confidence scores. Code for generating submission files will be provided.

Metrics

For a comprehensive assessment of the robustness of various semantic segmentation models, we adopt the following metrics:

  • mIoU - mean Intersection Over Union: quantifying the degree of overlap between correct predictions and actual labels against the total number of true positives, false positives and false negatives
  • AUROC: a threshold-free metric corresponding to the probability that a certain example has a higher value than an uncertain one
  • AUPR-Success - Area Under the Precision-Recall curve 
  • AUPR-Error - Area Under the Precision-Recall curve Error: computing the area under the Precision-Recall curve using errors as the positive class
  • FPR@95TPR: measuring the false positive rate when the true positive rate is set to 95%
  • ECE - Expected Calibration Error: measuring the expected difference between accuracy and predicted uncertainty