Key tables
Key or code tables are one of the simplest and most intuitive methods of pseudonymisation, in which original personal identifiers (such as names or personal codes) are replaced by pseudonyms, so that direct identifiers no longer appear in the dataset used for analysis.
Pseudonyms are most often generated using either consistent sequential numbers (e.g. P001, P002, […] P999 or ID-01, ID-02, […] ID-99) or randomly generated numbers (in English: randomly generated numbers).
A separate key file or key table is created to allow future editing, updating or linking of the dataset to other datasets. In the key or code table, aliases are linked to the corresponding personal data. This table should be kept separate from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of individuals (e.g. the principal investigator) and not to others.
Example
The original data contains personal data and direct identifiers (names and student ID numbers) that are not necessary for the analysis. However, there will be a need to complete this dataset in the future, and it is possible for participants to withdraw from the study and ask for their data to be deleted.
Original data
| Name, surname | Student ID | Faculty | Level of physical activity |
|---|---|---|---|
| John Berzins | St-2024-051 | Computer Science | Low |
| Līga Ozola | St-2024-302 | Medical | Medium |
| Karlis Priedītis | St-2024-568 | Social Sciences | Medium |
The pseudonymisation process creates a key table where personal data is linked to the newly created pseudonym (member ID). The key table shall be stored in a secure, encrypted database, separate from the pseudonymised dataset.
Key table
| Name, surname | Student ID | Participant ID |
|---|---|---|
| John Berzins | St-2024-051 | ID-001 |
| Līga Ozola | St-2024-302 | ID-002 |
| Karlis Priedītis | St-2024-568 | ID-003 |
The pseudonymised dataset does not contain personal data (names and student ID numbers). This dataset is to be used in the study.
Pseudonymised dataset
| Participant ID | Student ID | Level of physical activity |
|---|---|---|
| ID-001 | Computer Science | Low |
| ID-002 | Medical | Medium |
| ID-003 | Social Sciences | Medium |
The key table is simple and easy to understand and implement without complex algorithms. The researcher has full control over the format and content of the aliases. The code table makes it easy to update, add to or delete original data if necessary. If the data subject expresses a wish to have his/her data deleted, it is necessary to delete the entries both in the information file and in the key table.
However, a number of risks need to be taken into account:
-
If the key table becomes known to third parties, the whole pseudonymisation system breaks down. It is therefore also necessary to store pseudonymised data securely, as described in the Sensitive Data section of Chapter 3 of this guide. The key table should be kept separate from the research datasets, access should be restricted to specifically authorised persons, and each access should be logged
-
For large datasets, the key table can become difficult to manage and there is a risk of manual errors. In the case of large amounts of information to be analysed, algorithm-based pseudonymisation is recommended
Key tables
Key or code tables are one of the simplest and most intuitive methods of pseudonymisation, in which original personal identifiers (such as names or personal codes) are replaced by pseudonyms, so that direct identifiers no longer appear in the dataset used for analysis.
Pseudonyms are most often generated using either consistent sequential numbers (e.g. P001, P002, […] P999 or ID-01, ID-02, […] ID-99) or randomly generated numbers (in English: randomly generated numbers).
A separate key file or key table is created to allow future editing, updating or linking of the dataset to other datasets. In the key or code table, aliases are linked to the corresponding personal data. This table should be kept separate from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of individuals (e.g. the principal investigator) and not to others.
Example
The original data contains personal data and direct identifiers (names and student ID numbers) that are not necessary for the analysis. However, there will be a need to complete this dataset in the future, and it is possible for participants to withdraw from the study and ask for their data to be deleted.
Original data
| Name, surname | Student ID | Faculty | Level of physical activity |
|---|---|---|---|
| John Berzins | St-2024-051 | Computer Science | Low |
| Līga Ozola | St-2024-302 | Medical | Medium |
| Karlis Priedītis | St-2024-568 | Social Sciences | Medium |
The pseudonymisation process creates a key table where personal data is linked to the newly created pseudonym (member ID). The key table shall be stored in a secure, encrypted database, separate from the pseudonymised dataset.
Key table
| Name, surname | Student ID | Participant ID |
|---|---|---|
| John Berzins | St-2024-051 | ID-001 |
| Līga Ozola | St-2024-302 | ID-002 |
| Karlis Priedītis | St-2024-568 | ID-003 |
The pseudonymised dataset does not contain personal data (names and student ID numbers). This dataset is to be used in the study.
Pseudonymised dataset
| Participant ID | Student ID | Level of physical activity |
|---|---|---|
| ID-001 | Computer Science | Low |
| ID-002 | Medical | Medium |
| ID-003 | Social Sciences | Medium |
The key table is simple and easy to understand and implement without complex algorithms. The researcher has full control over the format and content of the aliases. The code table makes it easy to update, add to or delete original data if necessary. If the data subject expresses a wish to have his/her data deleted, it is necessary to delete the entries both in the information file and in the key table.
However, a number of risks need to be taken into account:
-
If the key table becomes known to third parties, the whole pseudonymisation system breaks down. It is therefore also necessary to store pseudonymised data securely, as described in the Sensitive Data section of Chapter 3 of this guide. The key table should be kept separate from the research datasets, access should be restricted to specifically authorised persons, and each access should be logged
-
For large datasets, the key table can become difficult to manage and there is a risk of manual errors. In the case of large amounts of information to be analysed, algorithm-based pseudonymisation is recommended