Red Team Participation Instructions - Document Intelligence

SaTML 2025 Competition: Inference Attacks Against Document VQA

Track 1 - Membership Inference

Track 1 participant's objective is to determine for each released DocVQA model which providers were part of the training data set.

The evaluation set for this task consists of a set of providers, and their documents, as well as a set of question-answer pairs for those documents.

We provide a small auxiliary set composed of documents and questions with the same distribution as the training data, with data from providers in the training set although these specific instances were never presented during training. It also includes data from providers that were never included in the training data. This validation set contains information indicating whether each provider was included in the training dataset.

Participants have gray-box access to the model, meaning they can only access the model’s outputs and logit values, but not the model’s weights or internal state. We provide the DocVQA model’s output logits on each of the question-answer pairs in the evaluation set, as a distribution over the top-100 tokens at each step of the output string, along with the model’s loss for that example.

Participants should classify, for each provider, whether any of its documents were seen by the model in the training set.

Participants will attack two models - one trained with differential privacy, and one without. The final score is evaluated as the average of scores on both models.

Track 2 - Reconstruction

Track 2 participant's objective is to reconstruct specific key value pairs of the training documents.

Participants have black-box access to the model, only accessing the model’s outputs. However, the adversary can interact with the model by choosing inputs to send as queries via an API to the model to receive responses. The number of allowed total queries to the model is limited.

The evaluation set for this task consists of a set of training set questions, each pertaining to a specific training document, where the answer to the question has been occluded from the document. Participants must extract from each document-question pair the correct answer that occurs in the training data - this answer does not occur on the occluded document, but may have been memorised by the model.

Participants will attack two models - one trained with differential privacy, and one without. The final score is evaluated as the average of scores on both models.

Participation rules

We have defined a series of rules to achieve fair comparison between participating teams and focus the efforts towards developing and improving membership inference attack (MIA) and reconstruction attacks applied on Document Visual Question Answering models.

Participants must register as a team, and disclose individual contributors, as well as contributors who join the team later on. Individual contributors cannot participate in more than one team.

If submitted methods utilise additional data beyond that provided in the competition framework, this should be disclosed along with the submission.

For the reconstruction attack, API requests are limited to a daily rate for each team.

No variants of the same method (with different hyperparameters) will be permitted. At the competition closure date, each team must have one single method, or otherwise clearly explain how submitted methods are different.

Participation Resources

To aid the development of attacks, we make the model architecture known (see "Target models" below), and encourage participants to train their own variants using publicly-available data.

The dataset used for training the target model comes from the same distribution as the PFL-DocVQA dataset published as part of the 2023 NeurIPS competition (the “Blue Team” track). Therefore, it is useful as an development dataset - but note that this data does not contain any discriminatory information useful for inferring provider membership or reconstructing data fields in the target models. We provide a centralized version of this dataset under "Red Team Downloads".

Track 1: Provider Membership Inference

Evaluation Set (D_test): Participants receive an evaluation set consisting of data from M providers, with the task of predicting each provider’s membership. The set includes question-answer pairs based on documents (with OCR provided) from these M providers. In line with the grey-box access assumption, we also provide the target models' inference outputs (loss, predicted answer, confidence and top-100 logits per predicted token) for all questions in this set.

Auxiliary Set (D_aux): In addition to D_test, an auxiliary set is provided with data from a smaller, disjoint set of providers. Membership information for these providers (whether their data was used to train the target model) is included. For each question-answer pair, the target model's outputs are provided, as in the evaluation set.

Track 2: Reconstruction

Evaluation Set (D_occ): Participants receive an evaluation set consisting of question-document pairs from the training data, where the answer to the question has been removed.

Model access: Participants can remotely query an instance of the target model(s) through an API that returns predicted answers to an image-question-OCR triplet. This is black-box access: the model returns only the final answer string, without any confidence or per-token logits, but participating teams may query the model repeatedly to extract the desired information. For more information, see the starting kit on the Red Team downloads page.

Participation Instructions

To participate in any of the tracks, first set up a team, or join an existing one, using the interface at the top of this page. Then download the starting kit and the datasets (coming soon). You can set up the baseline code by following the provided instructions in the starting kit framework.

To officially become a part of the competition, you need to submit the results of your attacks for both target models, non-private and private, through the ELSA benchmarks platform to upload their results, which will host the evaluation process and the final leaderboard.

Target models

The target model used is a Visual-T5 (VT5). VT5 is a version of the Hi-VT5 described in the MP-DocVQA paper, arranged in a non-hierarchical paradigm (using only one page for each question-answer pair). We start from a pre-trained t5-base for the language backbone, and pre-trained DiT-base to embed visual features (which we keep frozen during fine-tune phase). Then we fine-tune it on the Single-Page DocVQA (SP-DocVQA) dataset using the MP-DocVQA framework. You will have access to this checkpoint.

The final fine-tuning is on the target data set that is secret. We fine-tune two variants of the model on DocVQA for each type of attack:

Non-Private model: This model is fine-tuned on the PFL-DocVQA dataset without any privacy measures. This fine-tuning follows the same setup as the model for the blue track competition, but trained with a single client and no federated aggregation. This is the centralized setting as described in Tito et al. 2024.

Private model: This model is protected with differential privacy, specifically considering provider-level privacy. This means the protection aims to hide any information unique to a provider, potentially including all its training data. The fine-tuning method was introduced in a prior work (Tito et al., 2024) and is based on DP stochastic optimisation (Rajkumar et al., 2012, Song et al., 2013, Abadi et al, 2016). We clip per-provider updates and add noise to the aggregate update using a cryptographically secure PRNG. The privacy budget is given as ε=4 and δ=1e-5.

Baseline Attack Frameworks

Track 1: Membership Inference Attack

The framework currently implements two Provider Membership Inference Attack baselines:

Unsupervised Attack: This attack utilizes only the model's output. For each question on a document from a provider, we compute two DocVQA scores—Accuracy and Normalized Levenshtein Similarity—by comparing the predicted answer to the ground truth. For each provider, we average these scores across all questions and concatenate the two scores to form a feature vector. We then apply KMeans clustering to separate member and non-member providers, classifying providers in the cluster with higher Accuracy as members.

Supervised Attack: This attack leverages a validation set with known provider membership labels and grey-box model access to loss/confidence values. The feature vector from the Unsupervised Attack is augmented with the average loss/confidence, as well as their change relative to the initial checkpoint (pre-finetuning). A RandomForest classifier is then trained on these feature vectors to predict provider membership.

Track 2: Reconstruction Attack

The framework currently implements a simple information extraction attack by passing the occluded document image through the model once, along with its corresponding training set question.

Submission Format and Requirements:

Track 1 (Membership Inference)

Submissions should be in the form of a .json file for both Non-DP and DP models, linking provider IDs to a probability score predicting training set membership (1.0 for member; 0.0 for non-member). Probability values should create a unique ordering across providers, as the evaluation metric uses the ranked ordering of each prediction. Therefore, ‘ties’ in the probability values are not strictly valid, and will be assumed to be ranked in the order they are placed in the submission.

Example:

{

             "dp": {
                    "provider1_name": 0.991,
                    "provider2_name": 0.235,
                    ...
             },
             "nondp": {
                    "provider3_name": 0.872 ,
                    "provider4_name": 0.097,
                    ...
             }

Track 2 (Reconstruction)

Submissions should be in the form of a .json file for both Non-DP and DP models, linking image names to predicted answer strings corresponding to the evaluation set question for that image.

Example:

{

             "dp": {
                    "image1_name": "08012,
                    "image2_name": "John Smith",
                    ...
             },
             "nondp": {
                    "image3_name": "$45.50",
                    "image4_name": "Houston, TX",
                    ...
             }

Both Tracks

Method description: We require every submission to have a description of the method used, explaining the approach and relevant algorithmic decisions.

Source code: Teams should submit the source code for their method (in .zip format or similar). We will not release this code, but we require uploading it for further checking of the method.

After Submission

During the competition period, once you submit your method you will only see if the submission has been processed correctly. However, you won't be able to access the results, as they will remain hidden until the end of the competition. This is intended to prevent cherry-picking of different variants of the same method with modifications or hyperparameters. Hence, it is expected that you find the best method by evaluating on the validation set, and then submit it before the competition ends. At the end, the results and leaderboard will be made public.

Evaluation Metrics

Track 1 (Membership Inference)

For the provider membership inference attack, the accuracy metric is computed as the True Positive Rate at 3% False Positive Rate (TPR@3%FPR) metric. This is based on the ranked ordering of providers, retrieved by sorting the probability values submitted by participants. Values indicate the predicted probability of a provider being ‘IN’, as opposed to ‘OUT’. The ordering of providers according to these values is used to construct an ROC curve, from which the TPR@3% FPR is calculated.

This metric is calculated for both models (private and non-private), based on both respective submissions. The final team score on this track is the average of the metric for both models.

Track 2 (Reconstruction)

For the information reconstruction attack, the metric is simple binary accuracy (case-insensitive) between the extracted prediction and the ground-truth answer. There is no penalty for incorrect predictions; the final team score is the total number of correct extractions across both models.

For any questions about this challenge, please contact: info_pfl@cvc.uab.cat

Challenge News

09/01/2024
New Competition Track at SaTML
10/11/2023
ELSA sponsored prizes for winners announced
09/15/2023
Workshop at NeurIPS 2023
08/17/2023
Communications log fixed in baseline code
07/21/2023
Final version of the PFL-DocVQA framework released
06/30/2023
Release of training and validation splits.

Important Dates

September 2, 2024: Competition Web online.

October 14, 2024: Competition opens.

March 7, 2025: Submission deadline for entries.

March 17, 2025: Announcement of winning teams.

April 9-11, 2025: Competition track (during SaTML).