Health Data Hub

Health Data Hub

Co-create an educational notebook on the use and evaluation of synthetic data

Challenge

  • Create a educational notebook comparing various methods of generating synthetic data.
  • Allow the ecosystem to better understand them/deepen them by focusing on the quality, and by providing statistical tools.
  • Offer ways to assess usefulness and confidentiality of the data generated.

Solution

“The Octopize start-up method allows both to prove anonymity and ensure reproducibility analyses. In addition, It applies to all use cases with low difficulty in training the data”.

Setting up

Service provided by Octopize.

Maintaining statistical quality & utility

Source: https://gitlab.com/healthdatahub/tutoriel-generation-de-donnees-synthetiques-en-sante/-/blob/main/notebook/main.ipynb?ref_type=heads

Compared to two other methods of generating synthetic data (CT-GAN and structural schema), the avatar method allows better preservation of the usefulness of the original data while making it possible to prove the privacy it provides.

Results

  • This notebook provides tools for evaluating the anonymity and quality of the synthetic data generated.
  • Find the Press release for more details on the subject as well as the Notebook on Gitlab.