Overview - Health
Health data is highly sensitive and personal identifiable, thus requiring special protection. The state-of-the-art are solutions are controlled access schemes that couple data use to contractual regulatory and technical measures that address privacy at the cost of reducing the utility of the data. With differential privacy, we have methods on our hand that provide principled protection against such privacy threats. Unfortunately, these techniques need rigorous privacy accounting for each access to the data and novel algorithms - as differential privacy is a property of the algorithm itself. Hence, adaption to existing solutions is challenging and to some extend incompatible with exiting workflows for sharing and processing such data. Furthermore, as open access sharing of data is not possible, exploratory research is limited, and at a certain point the privacy budget might be depleted and no further access to the data can be granted. Differentially private synthetic data and generators are principled solutions for this problem.
The primary goal of this challenge is to advance synthetic data generators to allow for synthesizing data that capture relevant biomedical properties of the real data, while offering principled protection of the privacy of the study participants. With this novel challenge, we aim at bringing the research community together to advance the state-of-the-art for generating realistic and useful synthesized datasets. We provide several baselines for state-of-the-art generative models private and non-private generation. This challenge will build on widely used reference molecular datasets in oncology, which will simplify participation while maximizing relevance of the derived solutions.