Results - Health - ELSA Benchmarks Platform

Authors: Oleksandr Husak, ...

Affiliation: NCT/DKFZ

Description: ## Model Architecture and Approach
Our implementation leverages a class-conditional diffusion model with differential privacy.

### Core Components
- Denoising Neural Network: A multi-layer architecture with residual connections
- Adaptive Group Normalization (AdaGN): Custom normalization that dynamically adjusts based on both temporal and class conditioning signals
- Embedding Systems
- Sinusoidal time embeddings for diffusion timestep encoding
- Learned class embeddings for condition-guided generation
- Combined embedding projections to integrate multiple conditioning signals

## Diffusion Process Design
- Noise Schedule: Implements the improved cosine beta schedule from (DDPM++), providing several advantages:
- Smoother noise progression throughout the diffusion process
- Better sampling stability during the generative process
- Forward Process: Gradually adds noise according to the schedule, transforming real data into pure noise
- Reverse Process: Iteratively denoises from random noise to synthetic samples guided by class conditioning

## Privacy Preservation Mechanisms
- Differential Privacy Integration: Implemented via Opacus
- Privacy-Utility Tradeoff: Optimized through multi-objective hyperparameter tuning (MOASHA algorithm)

## Training Methodology
- Class-Weighted Loss Function: Addresses class imbalance using inverse square-root weighting to prevent over-representation of majority classes while stabilizing training
- Learning Rate Scheduling: Tested various schedulers (OneCycleLR, ReduceLROnPlateau, etc.) and selected ReduceLROnPlateau.

## Evaluation Framework
- Validation during training:
- Real-to-synthetic validation (train on real, test on synthetic)
- Synthetic-to-real validation (train on synthetic, test on real)

## Implementation Details
- PyTorch Lightning: Structured training and evaluation loops
- Hyperparameter Optimization: Multi-objective tuning balancing privacy budget (epsilon) and model utility (val_loss)

method: NoisyDiffusion2025-03-15

Authors: Jules Kreuer, Sofiane Ouaari

Affiliation: Methods in Medical Informatics - University of Tübingen

Description: The model implements a diffusion-based generative approach to the provided gene expression data with privacy considerations. At its core, the architecture uses a diffusion model with residual linear blocks with group normalisation.
The diffusion process follows a path where Gaussian noise is progressively added to the data according to different noise scheduling schemes (linear, cosine or power-based). During training, the model learns to predict and remove this noise. For reverse sampling, the model iteratively denoises random Gaussian samples guided by class conditioning to generate synthetic cancer data.
Key technical features include sinusoidal position embeddings for time-step encoding, attention blocks for capturing complex relationships within the data, and a privacy approach that combines differential privacy (by adding calibrated noise to gradients) and strong regularisation.

Additional improvements come from the implementation of early stopping, learning rate scheduling via OneCycleLR.
We explored attention and post-processing techniques such as outlier clipping, but discarded them as too computationally expensive and not useful enough.
The model training process includes gradient clipping to improve numerical stability.

method: Synthetic RNA-seq Data Generation with Private-PGM2025-03-14

Authors: Shane Menzies, Sikha Pentyala, Daniil Filienko, Jineta Banerjee, Martine De Cock

Affiliation: University of Washington Tacoma

Description: We adopt the Private-PGM method implemented by Chen, et al. in "Towards Biologically Plausible and Private Gene Expression Data Generation" (PETS2024), available from https://github.com/MarieOestreich/PRO-GENE-GEN, with parameters tuned for RNA-Seq data, using quantile binning into 4 bins per feature (gene), and differential privacy budget epsilon = 7. Experiments were run on the TACC Frontera / NAIRR Pilot.

Ranking Table

Description Paper Source Code

		Utility					Fidelity		Privacy
Date	Method	Accuracy (real)	Accuracy (synthetic)	AUPR (real)	AUPR (synthetic)	Number of overlapping important features	MMD score	Discriminative score	Distance to the closest (real)	Distance to the closest (synthetic)	MC MIA AUC	GAN-leaks MIA AUC	MC MIA PR AUC	GAN-leaks MIA PR AUC	MC MIA TPR@FPR=0.01	GAN-leaks MIA TPR@FPR=0.01	MC MIA TPR@FPR=0.1	GAN-leaks MIA TPR@FPR=0.1
2025-03-15	Class-conditional diffusion model with differential privacy	87.33%	9.91%	87.14%	24.44%	1.8	0.2697	99.80%	24.0301	9,887,871.1730	49.37%	50.00%	84.77%	90.00%	49.70%	100.00%	49.70%	100.00%
2025-03-15	NoisyDiffusion	87.05%	77.50%	85.70%	75.96%	18.2	0.0109	60.39%	24.0183	25.1569	52.33%	53.95%	80.97%	82.86%	1.56%	3.44%	10.56%	13.87%
2025-03-14	Synthetic RNA-seq Data Generation with Private-PGM	86.23%	81.54%	87.10%	76.30%	15.6	0.0086	86.38%	24.0422	27.2186	50.16%	50.52%	80.16%	80.60%	1.40%	1.38%	10.67%	10.56%
2025-03-08	Non-negative matrix factorization distorted input for CVAE	85.67%	82.09%	86.00%	74.57%	15.4	0.0228	83.43%	24.0493	26.4565	50.62%	51.34%	80.27%	81.03%	1.33%	1.79%	10.65%	11.36%
2025-03-16	Baseline (Multivariate)	86.41%	82.09%	85.77%	83.42%	20.6	0.0166	54.36%	24.0532	28.3770	52.33%	52.79%	81.15%	81.75%	1.58%	2.34%	11.34%	11.92%
2025-03-16	Synthetic RNA-seq Data Generation with Private-PGM (e = 10)	86.23%	82.92%	87.10%	77.58%	15	0.0074	77.69%	24.0422	27.0686	50.42%	50.18%	80.22%	80.16%	1.29%	1.01%	10.19%	9.94%

Health

Inactive evaluations

method: Class-conditional diffusion model with differential privacy2025-03-15

method: NoisyDiffusion2025-03-15

method: Synthetic RNA-seq Data Generation with Private-PGM2025-03-14

Ranking Table

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic

Ranking Graphic