Co-create an educational notebook on the use and evaluation of synthetic data
Challenge
Create a educational notebook comparing various methods of generating synthetic data.
Allow the ecosystem to better understand them/deepen them by focusing on the quality, and by providing statistical tools.
Offer ways to assess usefulness and confidentiality of the data generated.
Solution
“The Octopize start-up method allows both to prove anonymity and ensure reproducibility analyses. In addition, It applies to all use cases with low difficulty in training the data”.
Compared to two other methods of generating synthetic data (CT-GAN and structural schema), the avatar method allows better preservation of the usefulness of the original data while making it possible to prove the privacy it provides.
Results
This notebook provides tools for evaluating the anonymity and quality of the synthetic data generated.
Find the Press releasefor more details on the subject as well astheNotebookon Gitlab.