Can time series be anonymized?

Octopize's avatar anonymization software goes beyond the anonymization of traditional tabular data by addressing time series. These represent the evolution of a variable over time and can also reveal sensitive personal data. Recognizing the specific challenges they present, Octopize has adapted its software to be able to deal with them. Thus, Octopize allows the anonymization of temporal data while maintaining their sequence logic. A concrete example illustrates the effectiveness of avatar software, showing data before and after anonymization. For more information, read this article or check out our technical documentation.

Can time series be anonymized?

It is now clear that traditional tabular data where a line represents an individual can be easily anonymized with the avatar software. Today, we are looking at a question that comes up often: what about time series?

What is a time series?

When the value of a variable is collected at different times, the data obtained is a time series. In other words, a time series represents the evolution of a variable over time.

A lot of personal data is actually time series. Examples include a heart rate, an individual's body temperature or the speed of a car during a trip. Just like non-temporal data, this data can have a re-identifying character. Re-identification can also be made easier by the fact that a succession of values over time for a variable is more likely to be unique than a simple value. For example, the list of instantaneous speeds collected every second for an individual during a car trip will most certainly be unique, while their average speed during the trip will probably be less so.

It is therefore important that these temporal data be protected when they are linked to people.

Can time series be anonymized?

Given their very specific characteristics, the treatments defined for classical tabular data cannot be directly applied to time series. Anonymization methods for traditional data do not preserve the logic that exists between the points of a time series and the result will therefore be inconsistent from a business point of view.

At Octopize, we worked on adapting our avatar software to be able to anonymize temporal data while maintaining their sequence logic.

In the anonymization process, tabular data is modelled with factor analysis methods (mixed data factor analysis (AFDM) for example), making it possible to represent the data in a numerical space taking into account the relationships between variables.

In order to be able to apply anonymization mechanisms on time series, modeling specific to functional data was chosen since a time series can be considered as the evolution of a variable as a function of time. We therefore rely on functional principal component analysis (ACPF), an extension in the functional field of principal component analysis (PCA). This transformation of temporal data makes it possible to represent each series in a digital vector which can be anonymized in the same way as an ACP coefficient vector. Of course, an operation of reverse transformation is applied at the very end of the process to return to data in its original form.

Need an example?

Because nothing beats a concrete example to understand what time series avatar data looks like, here is a use case on personal data from two sensors. The illustration shows the original data on the left for each of these sensors. Each curve represents an individual's data. It is clear that some curves are particularly singular and could easily re-identify an individual.

Applying the avatar method to these signals results in anonymized data (on the right in the illustration). It is observed that all the data have been modified and that there are no longer any very re-identifying curves.
It can also be seen that the trends and the general appearance of the curves is maintained when creating the synthetic and anonymous avatar data.

To go further

Contrary to the previous example, some data have particular characteristics requiring a processing step prior to anonymization. This is particularly the case for pseudo-periodic data such as electrocardiograms (ECGs). To adapt our avatar software to these specific cases, extensive work was carried out and presented at the GRETSI conference. The detailed article is available hither. The periodic nature of ECG signals required a particular approach to preserve the temporal structure while ensuring anonymity. In particular, we have broken down the signal into cycles. These cycles are normalized and then anonymized, before being reassembled into complete signals. by ensuring the harmonization of meta-parameters. The figure below summarizes the anonymization process used:

So, theanonymization remains robust by maintaining the properties inherent to the specificity of ECG cycles, while ultimately obtaining complete ECG signals that retain information useful for learning or classification tasks.

For more information on this approach and its scope of application, we invite you to browse our technical documentation on this subject.

Sign up for our tech newsletter!