method: Feature Weighted and Residual Space Membership Inference Attacks on Synthetic Genomic Data2026-05-03
Authors: Ruixuan Liu, Harutoshi Okumura, Lily Wang, Owen Tucker, Li Xiong
Affiliation: Emory University
Description: In our work, we introduce new methods across both BRCA and COMBINED dataset. For instance, we use weighted LOGAN attacks by measuring gene importance. For our LOGAN-based distance attacks on the BRCA set, we weight each gene its KL divergence between synthetic and d_mia data. For the COMBINED dataset, we use PCA jointly on the synthetic and reference data and attack in residual space. For other attacks, we also implement previous work like GAN Leaks.
method: GAN-leaks baseline2026-01-16
Authors: Organiser
Affiliation: EMBL
Description: GAN-leaks baseline implemented as it provided in DOMIAS package
https://github.com/holarissun/DOMIAS/blob/main/src/domias/baselines.py
https://github.com/PMBio/Health-Privacy-Challenge/blob/main/src/mia/models/baseline.py
Authors: Charlene Jarrell, Daniil Filienko, Emma Szebenyi, Jonathan Kim, Sikha Pentyala, Steven Golob, Martine De Cock
Affiliation: University of Washington Tacoma
Description:
Key idea for deep learning models (ND and CVAE):
Multiple "synth"-shadow models are trained on numerous synthetic datasets generated by the SDG. Loss features are extracted across a range of timesteps, in the case of ND, and a range of latent samples, in the case of CVAE. The extracted loss trajectories are then used to train a classifier with the assumption that member patients will exhibit different loss patterns from nonmembers. "Synth"-shadow models are trained on synthetic data rather than real data since the final "proxy" diffusion model is trained on the target synthetic data.
Key idea for MVN:
For each target sample, we compute its Mahalanobis distance to the covariance matrix of the synthetic data's gene distributions. Samples closer to the synthetic distribution receive higher membership scores. If auxiliary reference data is provided (as in TCGA-COMBINED), we enhance the final membership prediction by 1) additionally computing the same distances to the covariance of the auxiliary distribution. Then we normalize the distance to the synthetic distribution by the distance to the auxiliary distribution, which pronounces where similarity to the synthetic data is most significant, in the context of the reference data.
| NoisyDiffusion | Private-PGM(ε=10) | CVAE | MVN | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Date | Method | AUC | PR_AUC | TPR@FPR=0.01 | TPR@FPR=0.1 | AUC | PR_AUC | TPR@FPR=0.01 | TPR@FPR=0.1 | AUC | PR_AUC | TPR@FPR=0.01 | TPR@FPR=0.1 | AUC | PR_AUC | TPR@FPR=0.01 | TPR@FPR=0.1 | |||
| 2026-05-03 | Feature Weighted and Residual Space Membership Inference Attacks on Synthetic Genomic Data | 50.62% | 79.94% | 0.34% | 11.83% | 51.89% | 80.89% | 0.92% | 11.24% | 81.71% | 94.56% | 25.03% | 52.47% | 100.00% | 100.00% | 100.00% | 100.00% | |||
| 2026-01-16 | GAN-leaks baseline | 52.60% | 81.72% | 3.10% | 11.94% | 48.56% | 79.31% | 0.69% | 9.40% | 71.44% | 91.72% | 27.32% | 42.14% | 50.85% | 79.92% | 1.49% | 8.50% | |||
| 2026-05-06 | MIAs on RNA-seq bulk synthetic data generators (NoisyDiffusion, CVAE, MVN) | 65.15% | 87.18% | 5.86% | 22.50% | 48.56% | 79.31% | 0.69% | 9.40% | 79.71% | 94.07% | 30.65% | 49.83% | 95.96% | 98.97% | 69.35% | 86.68% | |||