Why Is Encryption Not Enough To Protect Your Personal Data?

9. November 2021

The GDPR has significantly impacted the way we handle data. Data transfer and data analysis need to be dealt with caution and explicit consent. For businesses, complying with GDPR requires extra time and investment. Therefore, it’s always important to find the safest, most efficient method of data protection. In this blog post, we will compare the differences between data encryption and data anonymization, as well as the advantages of data anonymization compared with data encryption.

What Is Data Encryption?

For many, encryption is the first privacy protection method that comes into mind. Encryption is helpful in many situations, such as mitigating the risk of a data breach by blocking cyber attackers from accessing sensitive data once it is stolen. However, it is not the master key to prevent a data breach completely.

Data encryption is one of the technical methods acknowledged by Article 32 of GDPR for data protection. According to GDPR, encryption is

“the procedure that converts clear text into a hashed code using a key, where the outgoing information only becomes readable again by using the correct key.”

Types of Encryption

There are two types of encryption: symmetric and asymmetric. The difference lies in the secret key to encrypt and decrypt information. While symmetric encryption uses the same key to encrypt and decrypt information, asymmetric encryption uses a public key to encrypt data, and a private key to decrypt data. In both cases, the key protects information from unauthorized and illegal processing.

Symmetric Encryption vs Asymmetric Encryption

Why Encryption Is Not Enough

Theoretically, the message cannot be decrypted without proper authorization. However, in real life, encryption does not prevent a data breach completely. It only reduces the risk. A data violation can happen if other elements of data protection protocols are weak, for example, if:

The encrypted data is not transported under the right conditions (eg. the system sends an error message without encryption) [1]
The software is not timely updated [2]
Unpreventable human error [3]
The hackers have enough time and computing resources, it is possible to decrypt data without the key successfully [4]

After all, encryption is only the basic element of a multi-tiered and layered data protection approach. It is also worth mentioning, it is not suitable for every data protection situation. Implementing encryption independently is no longer enough for effective data protection. Therefore, having other methods at hand – such as anonymization – is highly recommended, especially considering the advantages of data anonymization when it comes to data involving images and/or videos.

What Is Data Anonymization?

Recital 26 GDPR defines anonymous data as follows:

“…information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.”

According to Recital 26 GDPR, fully anonymized data is out of the scope of GDPR, because anonymized data cannot be re-identified. It does not fall into the definition of “personal information”.

Advantages of Anonymization

For both individuals and businesses, anonymization is a helpful method to protect data privacy. The advantages of data anonymization include:

Data misuse can be easily prevented. Even if a data breach happens, personal information is still protected since it is not either comprehensible or utilizable for the hackers;
It is easier for the consumer to trust a company. The public is getting increasingly concerned about how data is handled, and by whom. By conducting anonymization, consumers understand that the company cares about privacy and would be more inclined to engage in business activities compared to their counterparts.
Data applies to machine learning and big data analytics. Perfectly anonymized data is able to maintain the major characteristics of the individuals (gender, age). Such information can be used in machine learning studies and big data analytics for various purposes.

Methods of Anonymization:

There are a few methods to anonymize data.

Data masking: data masking anonymizes data by changing the values. Sensitive data can be replaced by “x”, for example, to prevent potential data leak/detection.

Data anonymization method - Data masking — Data anonymization method – Data masking

Pseudonymization: pseudonymization replaces sensitive information with fake identifiers or pseudonyms. For example, it replaces “Anna Smith” with pseudonyms such as “Lisa Baker/S4t2 _4E”. Pseudonymization makes the data unidentifiable, but it is reversible if the key for re-identification is found.

Data anonymization method - Pseudonymization — Data anonymization method – Pseudonymization

Generalization: data generalization summarizes the general features of data in a target class, therefore making it less identifiable.

Data anonymization method - Generalization — Data anonymization method – Generalization

Permutation/data shuffling/swapping: mixes the values in the dataset, so the shuffled values do not represent the original records.

Data anonymization method - Permutation/data shuffling/swapping — Data anonymization method – Permutation/data shuffling/swapping

Why Synthetic Data Is Gaining More Popularity

However, these anonymization methods all have their shortcomings. The biggest problem is the integrity and accuracy of the data. After processing, the data becomes less accurate, or even statistically useless. In order to use anonymized data for training/machine learning purposes, synthetic data became an increasingly popular choice.

Synthetic data is artificially created data made by computer programs. It is being used by businesses to train their machine learning models and to protect privacy by de-identifying the PII in the data. For example, these are synthetic images generated by our software Deep Natural Anonymization. In this case, after synthesis, the significant characteristics of the original data remain, but it has no connection to the original data.

Synthetic data is generated algorithmically, and does not have a key for “de-encryption”. Therefore, there is next to zero chance for synthetic data to be misused. The algorithmically adjusted data keeps the characteristics of the original data, making it worthwhile for further machine learning analysis; and protecting the individuals’ privacy from being misused without consent.

Comparison: Encryption, Anonymization, Synthetic Data

Apart from encryption and anonymization, there are other methods to protect your data from breaches. There is no definite “best” way to protect your data, but there is a most suitable method for every situation. Suitable data protection methods should be used for the safety of your data. If you’d like to learn more about how we at brighter AI anonymize data and protect every identity in public, check out the case studies below, or contact us here.

Resources:

1 CNRS News; Stéphanie Delaune; “Data Protection: Encryption is not Enough”; 2016-08-12

2 Cool Tech Zone; Rakesh Naik; “Why data encryption just isn’t enough anymore in 2021. Explaining issues with current data encryption tools and ways for better data protection.”; 2021-09-17

3 Cool Tech Zone; Rakesh Naik; “Why data encryption just isn’t enough anymore in 2021. Explaining issues with current data encryption tools and ways for better data protection.”; 2021-09-17

4 Cool Tech Zone; Rakesh Naik; “Why data encryption just isn’t enough anymore in 2021. Explaining issues with current data encryption tools and ways for better data protection.”; 2021-09-17

Xinzhuo Xiao
Marketing & Communication
xinzhuo.xiao@brighter.ai