The Spanish data protection authority ("AEPD") and the European Data Protection Supervisor have released guidelines on the use of hashing techniques for pseudonymization and anonymization of personal data.
The guidelines are mainly addressed at data controllers who wish to implement hashbased mechanisms to pseudonymize or anonymize personal data. Despite the aspiration to make hash functions irreversible, the risk of re-identification of hashed data still exists and should be taken into consideration. As the regulators explain in their paper, the probability of re-identification increases as more information on the hashed values is available. The regulators give an example of Spanish phone numbers, where the type of characters (digits); length and prefix are constant and known in advance. In addition, the existence of additional identifiers associated with data subjects will make the hash data even more vulnerable.
In order to prevent the re-identification of the hash values, the guidelines recommend using an encryption algorithm with a key that is confidentially stored by the data controller, ensuring that the message is properly encrypted. Another strategy to prevent the re-identification, is adding random values to the original data, which is independent from the message or any other information provided about it.
In addition, and given the above-mentioned inherited challenges and probability of reidentification, the guidelines emphasize that using hash techniques to pseudonymize or anonymize personal data must be accompanied by conducting a re-identification risk analysis. Such risk analysis shall take into consideration the specific hash technique used and the type of information being hashed, paying special attention to any information that may be linked to the value represented by the hash function.