Key tables – DataverseLV

Key tablesnauris_b23k2026-02-16T13:12:03+02:00

Key tables

Key or code tables are one of the simplest and most intuitive methods of pseudonymisation, in which original personal identifiers (such as names or personal codes) are replaced by pseudonyms, so that direct identifiers no longer appear in the dataset used for analysis.

Pseudonyms are most often generated using either consistent sequential numbers (e.g. P001, P002, […] P999 or ID-01, ID-02, […] ID-99) or randomly generated numbers (in English: randomly generated numbers).

A separate key file or key table is created to allow future editing, updating or linking of the dataset to other datasets. In the key or code table, aliases are linked to the corresponding personal data. This table should be kept separate from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of individuals (e.g. the principal investigator) and not to others.

Example

The original data contains personal data and direct identifiers (names and student ID numbers) that are not necessary for the analysis. However, there will be a need to complete this dataset in the future, and it is possible for participants to withdraw from the study and ask for their data to be deleted.

Original data

Name, surname	Student ID	Faculty	Level of physical activity
John Berzins	St-2024-051	Computer Science	Low
Līga Ozola	St-2024-302	Medical	Medium
Karlis Priedītis	St-2024-568	Social Sciences	Medium

The pseudonymisation process creates a key table where personal data is linked to the newly created pseudonym (member ID). The key table shall be stored in a secure, encrypted database, separate from the pseudonymised dataset.

Key table

Name, surname	Student ID	Participant ID
John Berzins	St-2024-051	ID-001
Līga Ozola	St-2024-302	ID-002
Karlis Priedītis	St-2024-568	ID-003

The pseudonymised dataset does not contain personal data (names and student ID numbers). This dataset is to be used in the study.

Pseudonymised dataset

Participant ID	Student ID	Level of physical activity
ID-001	Computer Science	Low
ID-002	Medical	Medium
ID-003	Social Sciences	Medium

The key table is simple and easy to understand and implement without complex algorithms. The researcher has full control over the format and content of the aliases. The code table makes it easy to update, add to or delete original data if necessary. If the data subject expresses a wish to have his/her data deleted, it is necessary to delete the entries both in the information file and in the key table.

However, a number of risks need to be taken into account:

If the key table becomes known to third parties, the whole pseudonymisation system breaks down. It is therefore also necessary to store pseudonymised data securely, as described in the Sensitive Data section of Chapter 3 of this guide. The key table should be kept separate from the research datasets, access should be restricted to specifically authorised persons, and each access should be logged

For large datasets, the key table can become difficult to manage and there is a risk of manual errors. In the case of large amounts of information to be analysed, algorithm-based pseudonymisation is recommended

Pseudonymisation

Mixer functions

Key tables

Key or code tables are one of the simplest and most intuitive methods of pseudonymisation, in which original personal identifiers (such as names or personal codes) are replaced by pseudonyms, so that direct identifiers no longer appear in the dataset used for analysis.

Pseudonyms are most often generated using either consistent sequential numbers (e.g. P001, P002, […] P999 or ID-01, ID-02, […] ID-99) or randomly generated numbers (in English: randomly generated numbers).

A separate key file or key table is created to allow future editing, updating or linking of the dataset to other datasets. In the key or code table, aliases are linked to the corresponding personal data. This table should be kept separate from the pseudonymised research data in a place where access control, encryption and security measures for sensitive data are ensured. It is only accessible to the researcher who needs to know the real identities of individuals (e.g. the principal investigator) and not to others.

Example

The original data contains personal data and direct identifiers (names and student ID numbers) that are not necessary for the analysis. However, there will be a need to complete this dataset in the future, and it is possible for participants to withdraw from the study and ask for their data to be deleted.

Original data

Name, surname	Student ID	Faculty	Level of physical activity
John Berzins	St-2024-051	Computer Science	Low
Līga Ozola	St-2024-302	Medical	Medium
Karlis Priedītis	St-2024-568	Social Sciences	Medium

The pseudonymisation process creates a key table where personal data is linked to the newly created pseudonym (member ID). The key table shall be stored in a secure, encrypted database, separate from the pseudonymised dataset.

Key table

Name, surname	Student ID	Participant ID
John Berzins	St-2024-051	ID-001
Līga Ozola	St-2024-302	ID-002
Karlis Priedītis	St-2024-568	ID-003

The pseudonymised dataset does not contain personal data (names and student ID numbers). This dataset is to be used in the study.

Pseudonymised dataset

Participant ID	Student ID	Level of physical activity
ID-001	Computer Science	Low
ID-002	Medical	Medium
ID-003	Social Sciences	Medium

The key table is simple and easy to understand and implement without complex algorithms. The researcher has full control over the format and content of the aliases. The code table makes it easy to update, add to or delete original data if necessary. If the data subject expresses a wish to have his/her data deleted, it is necessary to delete the entries both in the information file and in the key table.

However, a number of risks need to be taken into account:

If the key table becomes known to third parties, the whole pseudonymisation system breaks down. It is therefore also necessary to store pseudonymised data securely, as described in the Sensitive Data section of Chapter 3 of this guide. The key table should be kept separate from the research datasets, access should be restricted to specifically authorised persons, and each access should be logged

For large datasets, the key table can become difficult to manage and there is a risk of manual errors. In the case of large amounts of information to be analysed, algorithm-based pseudonymisation is recommended

Pseudonymisation

Mixer functions

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Privātuma politika

Data Deposit Terms