Hash functions – DataverseLV

Hash functionsnauris_b23k2026-02-16T13:12:12+02:00

Mixing functions (English: hash functions)

Mixing functions are mathematical algorithms that systematically and irreversibly transform original data into an unrecognisable form. Input data of any size (e.g. a name or a personal identification number) is transformed into a fixed-length string called a hash value or checksum.

This is a one-way function – it is almost impossible to recover the original value from the hash value. With the hash function, the same input value will always produce the same hash value, so even small changes to the input data will produce a completely different hash value.

With hash functions, there is no need to create a key file. However, the original values that will be used to identify the persons must be stored in a separate table and the hash values must be fixed for conversion.

The original table of values should be stored separately from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of the individuals (e.g. the Principal Investigator) and not to others. When using hash functions for pseudonymisation, it is recommended to use widely used algorithms such as SHA-2, SHA-3 or SHA-256). Hash functions are available for both Python (e.g, hashlib library), both in R (e.g, digest package).

Example

Using the hash function in pseudonymisation step by step:

Selects an appropriate hashing algorithm (e.g. SHA-256)
Selects the identifier to be pseudonymised (e.g. “Jānis Bērziņš”, “Līga Ozola”, “Kārlis Priedītis”)
The hash algorithm converts this text into a fixed-length string of numbers and letters (e.g. “5d41402abc4b2a76b9719d911017c592”)
This hash value replaces the original identifier throughout the dataset

Original data

Name, surname	Faculty	Level of physical activity
John Berzins	Computer Science	Low
Līga Ozola	Medical	Medium
Karlis Priedītis	Social Sciences	Medium

In the pseudonymised dataset, the hash value replaces the first and last name using the SHA-256 hash function.

Pseudonymised dataset (example)

Mixing value	Faculty	Physical activity level
7b9d67f94873e2d4c7874bc5742227c7b0d44fad343e29e9686dd2608b489985	Computer Science	Low
9f9dda4086fb4a6430ea3518aa5f724dc9d1c134d0eee44580edca19b62e0d3f	Medical	Medium
f3cf16d99e4e05d5ec35528bcbbe9d4fa40d86e7a19bba202d8f07b9a9b0d663	Social sciences	Medium

Table of original values (contains personal data; kept separately in a secure place)

Name, surname
John Berzins
Līga Ozola
Karlis Priedītis

Mixing functions (English: hash functions)

Mixing functions are mathematical algorithms that systematically and irreversibly transform original data into an unrecognisable form. Input data of any size (e.g. a name or a personal identification number) is transformed into a fixed-length string called a hash value or checksum.

This is a one-way function – it is almost impossible to recover the original value from the hash value. With the hash function, the same input value will always produce the same hash value, so even small changes to the input data will produce a completely different hash value.

With hash functions, there is no need to create a key file. However, the original values that will be used to identify the persons must be stored in a separate table and the hash values must be fixed for conversion.

The original table of values should be stored separately from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of the individuals (e.g. the Principal Investigator) and not to others. When using hash functions for pseudonymisation, it is recommended to use widely used algorithms such as SHA-2, SHA-3 or SHA-256). Hash functions are available for both Python (e.g, hashlib library), both in R (e.g, digest package).

Example

Using the hash function in pseudonymisation step by step:

Selects an appropriate hashing algorithm (e.g. SHA-256)
Selects the identifier to be pseudonymised (e.g. “Jānis Bērziņš”, “Līga Ozola”, “Kārlis Priedītis”)
The hash algorithm converts this text into a fixed-length string of numbers and letters (e.g. “5d41402abc4b2a76b9719d911017c592”)
This hash value replaces the original identifier throughout the dataset

Original data

Name, surname	Faculty	Level of physical activity
John Berzins	Computer Science	Low
Līga Ozola	Medical	Medium
Karlis Priedītis	Social Sciences	Medium

In the pseudonymised dataset, the hash value replaces the first and last name using the SHA-256 hash function.

Pseudonymised dataset (example)

Mixing value	Faculty	Physical activity level
7b9d67f94873e2d4c7874bc5742227c7b0d44fad343e29e9686dd2608b489985	Computer Science	Low
9f9dda4086fb4a6430ea3518aa5f724dc9d1c134d0eee44580edca19b62e0d3f	Medical	Medium
f3cf16d99e4e05d5ec35528bcbbe9d4fa40d86e7a19bba202d8f07b9a9b0d663	Social sciences	Medium

Table of original values (contains personal data; kept separately in a secure place)

Name, surname
John Berzins
Līga Ozola
Karlis Priedītis

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Privātuma politika

Data Deposit Terms