Key tables

Key or code tables are one of the simplest and most intuitive methods of pseudonymisation, in which original personal identifiers (such as names or personal codes) are replaced by pseudonyms, so that direct identifiers no longer appear in the dataset used for analysis.
Pseudonyms are most often generated using either consistent sequential numbers (e.g. P001, P002, […] P999 or ID-01, ID-02, […] ID-99) or randomly generated numbers (in English: randomly generated numbers).
A separate key file or key table is created to allow future editing, updating or linking of the dataset to other datasets. In the key or code table, aliases are linked to the corresponding personal data. This table should be kept separate from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of individuals (e.g. the principal investigator) and not to others.

Example

The original data contains personal data and direct identifiers (names and student ID numbers) that are not necessary for the analysis. However, there will be a need to complete this dataset in the future, and it is possible for participants to withdraw from the study and ask for their data to be deleted.
Original data
Name, surname Student ID Faculty Level of physical activity
John Berzins St-2024-051 Computer Science Low
Līga Ozola St-2024-302 Medical Medium
Karlis Priedītis St-2024-568 Social Sciences Medium
The pseudonymisation process creates a key table where personal data is linked to the newly created pseudonym (member ID). The key table shall be stored in a secure, encrypted database, separate from the pseudonymised dataset.
Key table
Name, surname Student ID Participant ID
John Berzins St-2024-051 ID-001
Līga Ozola St-2024-302 ID-002
Karlis Priedītis St-2024-568 ID-003
The pseudonymised dataset does not contain personal data (names and student ID numbers). This dataset is to be used in the study.
Pseudonymised dataset
Participant ID Student ID Level of physical activity
ID-001 Computer Science Low
ID-002 Medical Medium
ID-003 Social Sciences Medium
The key table is simple and easy to understand and implement without complex algorithms. The researcher has full control over the format and content of the aliases. The code table makes it easy to update, add to or delete original data if necessary. If the data subject expresses a wish to have his/her data deleted, it is necessary to delete the entries both in the information file and in the key table.
However, a number of risks need to be taken into account:
  • If the key table becomes known to third parties, the whole pseudonymisation system breaks down. It is therefore also necessary to store pseudonymised data securely, as described in the Sensitive Data section of Chapter 3 of this guide. The key table should be kept separate from the research datasets, access should be restricted to specifically authorised persons, and each access should be logged
  • For large datasets, the key table can become difficult to manage and there is a risk of manual errors. In the case of large amounts of information to be analysed, algorithm-based pseudonymisation is recommended

Key tables

Key or code tables are one of the simplest and most intuitive methods of pseudonymisation, in which original personal identifiers (such as names or personal codes) are replaced by pseudonyms, so that direct identifiers no longer appear in the dataset used for analysis.
Pseudonyms are most often generated using either consistent sequential numbers (e.g. P001, P002, […] P999 or ID-01, ID-02, […] ID-99) or randomly generated numbers (in English: randomly generated numbers).
A separate key file or key table is created to allow future editing, updating or linking of the dataset to other datasets. In the key or code table, aliases are linked to the corresponding personal data. This table should be kept separate from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of individuals (e.g. the principal investigator) and not to others.

Example

The original data contains personal data and direct identifiers (names and student ID numbers) that are not necessary for the analysis. However, there will be a need to complete this dataset in the future, and it is possible for participants to withdraw from the study and ask for their data to be deleted.
Original data
Name, surname Student ID Faculty Level of physical activity
John Berzins St-2024-051 Computer Science Low
Līga Ozola St-2024-302 Medical Medium
Karlis Priedītis St-2024-568 Social Sciences Medium
The pseudonymisation process creates a key table where personal data is linked to the newly created pseudonym (member ID). The key table shall be stored in a secure, encrypted database, separate from the pseudonymised dataset.
Key table
Name, surname Student ID Participant ID
John Berzins St-2024-051 ID-001
Līga Ozola St-2024-302 ID-002
Karlis Priedītis St-2024-568 ID-003
The pseudonymised dataset does not contain personal data (names and student ID numbers). This dataset is to be used in the study.
Pseudonymised dataset
Participant ID Student ID Level of physical activity
ID-001 Computer Science Low
ID-002 Medical Medium
ID-003 Social Sciences Medium
The key table is simple and easy to understand and implement without complex algorithms. The researcher has full control over the format and content of the aliases. The code table makes it easy to update, add to or delete original data if necessary. If the data subject expresses a wish to have his/her data deleted, it is necessary to delete the entries both in the information file and in the key table.
However, a number of risks need to be taken into account:
  • If the key table becomes known to third parties, the whole pseudonymisation system breaks down. It is therefore also necessary to store pseudonymised data securely, as described in the Sensitive Data section of Chapter 3 of this guide. The key table should be kept separate from the research datasets, access should be restricted to specifically authorised persons, and each access should be logged
  • For large datasets, the key table can become difficult to manage and there is a risk of manual errors. In the case of large amounts of information to be analysed, algorithm-based pseudonymisation is recommended