Red Team Participation Instructions - Health (2026)
🍅 Red Team Home Page
Login or Register to Elsa to enter / create a Team
Task Definition
Participants are tasked to launch membership inference attacks (MIA) on synthetic RNA sequencing datasets, TCGA-BRCA and TCGA COMBINED, generated with baseline generative methods.
-
TCGA-BRCA, <1,089 individuals x 978 genes>
-
Five subtypes with an imbalanced distribution
-
Suitable for subtype prediction task
-
-
TCGA COMBINED, <4,323 individuals x 978 genes>
-
10 cancer tissues, including Breast, Colorectal, Esophagus, Kidney, Liver, Lung, Ovarian, Pancreatic, Prostate, and Skin, with an imbalanced distribution
-
Suitable for cancer tissue-of-origin prediction task
-
The red teams will use the whole datasets as the test set and will identify which data points were used in the training set to generate the provided synthetic dataset.
Participation instructions
-
The read teams are provided with the synthetic datasets generated by four Blue Team solutions from CAMDA 2025 and their white-box code (public repositories).
-
To access the Blue Team solutions and their synthetic datasets, please register on the ELSA Benchmark system and create a team.
Attack Guidelines:
-
You can choose any solution to attack. However, you must provide predictions for both datasets associated with the selected solution.
-
You can attack multiple solutions to improve your chances of ranking higher on the leaderboard.
-
For solutions you do not attack, your score will default to the baseline attack score using the GAN-leaks algorithm (Chen et al., 2020) as implemented in DOMIAS package (Van Breugel et al., 2023).
Submission Requirements:
Each submission must include four prediction CSVs per dataset (eight in total).
-
For solutions you did not attack, submit an empty CSV file (0 bytes) with the correct naming structure.
For example, if an attack was launched on Model_1 and Model_4 only, the submission should look like this for both of the datasets:
-
synthetic_data_1_predictions.csv (15KB)
-
synthetic_data_2_predictions.csv (Zero Bytes)
-
synthetic_data_3_predictions.csv (Zero Bytes)
-
synthetic_data_4_predictions.csv (12KB)
For other requirements for submission, refer to the Submission Checklist below.
Baseline methods
The following baseline methods and their respective performance on MIA are provided as part of the Github Starter Package Repo for participants.
We utilised some of the baselines methods provided in DOMIAS package,
-
DOMIAS KDE (Van Breugel et al., 2023)
-
GAN-leaks and GAN-leaks calibrated (Chen et al., 2020)
-
LOGAN (Hayes et al., 2019)
-
Monte Carlo (MC) (Hilprecht et al., 2019) (NOTE: We observed that MC algorithm is sensitive to high-dimensional inputs, so its results should be interpreted with caution.)
We also include simpler confidence-based attacks:
-
Logistic Regression (trained on synthetic data, scored by maximum predicted probability)
-
Random Forest (trained on synthetic data, scored by maximum predicted probability)
External (reference) dataset
-
DOMIAS, GAN-leaks calibrated and LOGAN require an external reference dataset, which reflects the true data distribution and not utilized during generative or test processes.
-
We provide a reference dataset for TCGA-COMBINED dataset only, of <824 individuals x 978 genes >. This dataset is not shared with Blue team.
-
You are free to use relevant public datasets as a reference set in case your method depends on it.
📈 Evaluation
We provide classification performance metrics, AUC, AUPR, TPR @ FPR = [0.01, 0.1] for baseline methods in Github Starter Package Repo.
We strongly encourage you to explore additional metrics that could provide better insights, and include these to your extended abstracts for CAMDA.
✅ Submissions checklist
The following files are required for benchmark method submission, compressed in a zip file, for each dataset:
-
config.yaml: Config file with attack model configurations,
-
Prediction files: CSV files with a single column named membership_label without index,
-
synthetic_data_1_predictions.csv
-
synthetic_data_2_predictions.csv
-
synthetic_data_3_predictions.csv
-
synthetic_data_4_predictions.csv
-
-
White-box code: Modified red_team.py and other necessary .py files
-
environment.yaml: Environment file to run and reproduce the results.
We expect two files from red teams during each submission period in the below filename format:
-
redteam_{teamname}_TCGA-BRCA.zip
-
redteam_{teamname}_TCGA-COMBINED.zip
Good luck! 🍀
References
Please make sure to cite the following papers if any of the baseline methods and evaluation metrics are mentioned/utilized in your CAMDA extended abstracts.
Competition related
- CAMDA 2026 Health Privacy Challenge
Dataset sources
- Genomic Data Commons (GDC), https://gdc.cancer.gov/, https://portal.gdc.cancer.gov/, accessed on Nov 1, 2024
Dataset preprocessing
- Chen, Dingfan, Marie Oestreich, Tejumade Afonja, Raouf Kerkouche, Matthias Becker, and Mario Fritz. "Towards biologically plausible and private gene expression data generation." arXiv preprint. (2024)
- Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology. (2014)
- Colaprico, Antonio, Tiago C. Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S. Sabedot et al. "TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data." Nucleic acids research. (2016)
- Subramanian, A., Narayan, R., Corsello, S.M., Peck, D.D., Natoli, T.E., Lu, X., Gould, J., Davis, J.F., Tubelli, A.A., Asiedu, J.K. and Lahr, D.L. "A next generation connectivity map: L1000 platform and the first 1,000,000 profiles." Cell. (2017)
- Landmark genes, https://clue.io/command?q=/gene-space%20lm, accessed on Nov 1, 2024
Generative models and evaluations
- Sohn, Kihyuk, Honglak Lee, and Xinchen Yan. "Learning structured output representation using deep conditional generative models." Advances in neural information processing systems. (2015)
- Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International conference on machine learning. PMLR. (2017)
- Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. "Improved training of wasserstein gans." Advances in neural information processing systems. (2017)
- Xu, Lei, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. "Modeling tabular data using conditional gan." Advances in neural information processing systems. (2019)
- Holsten, L., Dahm, K., Oestreich, M., Becker, M., & Ulas, T. "hCoCena: A toolbox for network-based co-expression analysis and horizontal integration of transcriptomic datasets. STAR protocols." (2024)
- Apellániz, Patricia A., Juan Parras, and Santiago Zazo. "An improved tabular data generator with VAE-GMM integration." 2024 32nd European Signal Processing Conference (EUSIPCO). IEEE. (2024)
- Lun ATL, McCarthy DJ, Marioni JC. “A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.” F1000Res. (2016)
Membership inference attack models
- Van Breugel, B., Sun, H., Qian, Z., & van der Schaar, M. "Membership inference attacks against synthetic data through overfitting detection." arXiv preprint. (2023)
- Chen, D, Yu, N., Zhang, Y., and Fritz, M. "Gan-leaks: A taxonomy of membership inference attacks against generative models." In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security (2020)
- Hilprecht, B., Härterich, M., & Bernau, D. "Monte carlo and reconstruction membership inference attacks against generative models." Proceedings on Privacy Enhancing Technologies. (2019)
- Hayes, J., Melis, L., Danezis, G. & De Cristofaro, E. "Logan: Membership inference attacks against generative models." arXiv preprint. (2019)
Challenge News
Important Dates
Jan, 2025: Benchmark method submissions open.
May, 2025: Track I & Track II benchmark submission deadline.
May, 2025: CAMDA 2026 extended abstract submission deadline.
Jul 12-16, 2026: CAMDA Conference @ ISMB 2026 in Washington D.C., USA.