In a world where sensitive personal data have become the backbone in many industries, privacy protection is more important than ever. Health data, in particular, require specific treatment because of their highly sensitive nature and their vital importance. In this article, we dive into the complex world of health data anonymization, exploring its critical role in protecting the privacy while allowing theethical use and relevant to this data.
Background of sensitive personal data
Sensitive data is a particular category of personal data. Sensitive data refers to information that, if compromised or used improperly, could lead to harmful consequences for an individual. This data generally concerns aspects very intimate or private of a person's life. Here are some examples of sensitive personal data:
- health data,
- financial data,
- biometric data,
- sexual or political orientation,
- direct identifiers like the social security number...
According to the CNIL, sensitive data is “information that reveals the alleged racial or ethnic origin, political opinions, religious or philosophical beliefs or trade union membership, as well as the treatment of data genetics, the data biometric data for the purpose of uniquely identifying a natural person, data concerning health or data relating to the sexual life or sexual orientation of a natural person.”
Legal guidelines for institutions
Because of the particularly sensitive nature of this data, its processing is often subject to strict data protection regulations, such as the General Data Protection Regulation (GDPR) in Europe or HIPAA in the United States. Indeed, it is crucial to put in place robust security measures to protect this data against unauthorized access, disclosure, or misuse.
The processing of sensitive data according to the CNIL
The GDPR prohibits the collection or use of sensitive personal data, except in certain cases:
- if the person concerned has given their express consent (an active, explicit and preferably written process, which must be free, specific and informed);
- if the information is made public by the person concerned;
- whether they are necessary for the protection of human life;
- if their use is justified by the public interest and authorized by the CNIL;
- if they concern members or members of a political, religious, philosophical, political or trade union association or organization.
The right questions to ask yourself according to the CNIL:
- What data do I really need to achieve the purpose set for my file?
- Have I clearly distinguished between mandatory and optional data?
- Is the data I am collecting objective?
- Will I be able to transparently give access to all the data I hold on them to anyone who requests it?
- Am I collecting sensitive data? Am I allowed to collect this data? Is it justified in terms of my missions? Can I do otherwise?
The processing of health data according to HIPAA
The Health Insurance Portability and Accountability Act (HIPAA) also establishes strict rules to protect personal health data in the United States. For example, this law prohibits the unauthorized disclosure of identifiable health information, imposes security standards to protect that data, and gives individuals rights to their own health information. Thus, HIPAA imposes national standards to protect sensitive patient health information from being disclosed without their knowledge or consent.
The HIPAA privacy rule has 12 exceptions: cases in which patient data can be shared with other entities without their consent.
These exceptions are as follows:
- Victims of domestic violence or other assaults.
- Judicial and administrative procedures.
- Donation of organs, eyes, or cadaveric tissue.
- Workers' compensation.
How to anonymize health data?
After examining the challenges associated with the processing of sensitive data and in particular health data, it is crucial to explore ethical solutions for their use. Data anonymization is a relevant response, making it possible to take advantage of information while maintaining the privacy and security of the individuals concerned.
Definition of anonymization
Anonymization consists in using techniques in such a way as to make it impossible, in practice, to re-identify the individuals at the origin of anonymized personal data. This treatment is irreversible, which means that anonymized data is no longer considered personal data, thus going beyond the scope of application of the GDPR. To characterize anonymization, the European Data Protection Board (EDPS, and ex G29) is based on the 3 criteria set out in the opinion of 05/2014 (Source at the bottom of the page):
- Individualization : anonymous data should not make it possible to distinguish an individual. Therefore, even with all the almost identifying information relating to an individual, it must be impossible to distinguish the individual in a database once anonymized.
- Correlation : anonymous data should not be able to be re-identified by cross-referencing it with other data sets. Thus it must be impossible to link two sets of data from different sources concerning the same individual. Once anonymized, an individual's health data should not be able to be linked to their banking data on the basis of common information.
- Inference : the data should not allow additional information about an individual to be inferred in a reasonable manner. For example, it must be impossible to determine with certainty the health status of an individual based on anonymous data. It is when these three criteria are met that data is considered anonymous strictly speaking. They then change their legal status: they are no longer considered as personal data and go beyond the scope of the GDPR.
A unique and compliant anonymization solution: avatar software
It is important to mention that the last years of research have seen the emergence of numerous anonymization techniques, including the two main families (according to the CNIL): generalization and randomization (more details in this item).
More recently, a practice has also proved to be very relevant; the generation of anonymous synthetic data. In fact, synthetic data ensures a high retention of statistical relevance and facilitate the reproducibility of scientific results. They are based on the creation of models to understand and reproduce the global structure of the original data. In particular, a distinction is made between adversary neural networks (GANs) and methods based on conditional distributions.
Do you want to use personal data for new uses without constraints?
Octopize, a deeptech startup, has developed the avatar anonymization software : a unique conceptual approach, centered on the individual, allowing the creation of anonymous, protected and relevant synthetic data while providing proof of their protection. Its patented algorithm was the subject ofA publication in the scientific journal Nature Digital Medicine and was successfully evaluated by the French CNIL on the 3 criteria of the EDPS (individualization, correlation and inference).
Synthetic avatar data is different from the original data, while maintaining the same granularity and the same relationships between variables. They can handle same analyses of data and be used to train the same machine learning algorithm without the risk of re-identification for your users. For more technical details, consult the technical documentation for the solution.
Also, the software meets the challenges of maintaining the privacy of sensitive health data while maintaining their informative value for other uses: sharing, valorization, AI, Open Data, conservation... The avatar solution therefore unlocks the potential of your sensitive data while ensuring their compliance with regulations such as the CNIL.
Below are some health use cases in which the avatar solution has proved successful:
- CHU: analyses
- bioMerieux: sharing
- APP: AI
To consult all our customer cases, including those in the health sector, click here.
Where do you start? Discover software to help you with global GDPR compliance
La RGPD compliance in general has been mandatory since 2018. In fact, not all businesses are always very aware of the reality of GDPR compliance. Here are 7 basic rules to ensure GDPR compliance:
- Minimize data collection
- Obtaining the consent of the persons concerned
- Ensuring transparency and information
- Facilitating the exercise of individual rights
- Limit data retention
- Securing and protecting data
- Maintain a continuous level of compliance
It is possible to start the compliance process without external help, but today there is software that makes it possible to automate compliance and to comply very quickly and much more easily. Leto is a great example of compliance software that allows you to connect to more than 6000 Saas. This allows you to search for the personal data of your customers, prospects and employees in an automated way. The platform is collaborative between the various departments, so everyone participates in the company's GDPR compliance, strengthening trust and security!
Leto is a perfect tool to ensure the proper implementation of global GDPR compliance!
In conclusion, the protection of sensitive data, such as health data, is of paramount importance for guarantee the privacy and security of individuals. Faced with these challenges, data anonymization is presented as a ethical and effective solution, allowing privacy to be maintained while using information in a responsible manner. By developing techniques such as anonymous synthetic data, significant advances are being made to reconcile data protection and data use for beneficial purposes. Innovative solutions, such as the avatar anonymization software by Octopize, offer a people-centered approach and ensure compliance with regulations such as CNIL. By adopting these technologies, organizations can ensure data security while unlocking their potential for diverse and ethical uses.