method: Scdesign2 Poisson Ensemble.2025-05-13
Authors: Patrick McKeever, Daniil Filienko, Steven Golob, Shane Menzies, Sikha Pentyala, Luca Foschini, Jineta Banerjee, Martine De Cock
Affiliation: University of Washington Tacoma, Sage Bionetworks
Description: To generate single-cell data, we used a hybrid approach combining scDesign2 and the Poisson generation code of the model baseline. scDesign2 was trained on all highly variable genes (1118 in total) detected by scanpy, while Poisson models were left to fill in the rest. This approach substantially reduced the training time and memory requirements of scdesign2 while retaining similar ARIs. This is a reasonable approach, since the copula-based generation method of scDesign2 will preserve correlations between highly variable genes, while non-highly-variable genes can be accurately approximated with individual distributions.
The novel contribution of our work will be an investigation of the privacy-preserving properties of scdesign2. Our extended abstract, to be submitted Wednesday, will provide more detail.
Full code, including models, is available here: https://github.com/Patrick-McKeever/camda_hpc/tree/main