Data deletion

Data deletion is one of the simplest anonymisation methods. It is the deletion of sensitive data from a dataset. This can be done in cases where the sensitive data is no longer needed, does not need to be updated and will no longer be used in data analysis. This method can be conveniently used for direct identifiers, while other anonymisation methods may need to be used for indirect identifiers.

Example

The original data contains direct identifiers (name, surname, personal identification number, telephone number and student ID number) and indirect identifiers (age, city, diagnosis). In this example, only the direct identifiers can be deleted as they will not be used in the data analysis; otherwise, almost the whole dataset would have to be deleted.
Original data
ID Name, surname Age City Personal code Phone Diagnosis
101 John Berzins 35 Sigulda 120390-***** 29123456 Hypertension
102 Anna Kalnina 28 Ape 040795-***** 26789012 Diabetes
103 Peteris Ozols 40 Dobele 3150882-***** 22334455 Migraine
104 Laura Linden 32 Suntaži 080188-***** 26543218 Multiple sclerosis
Deletes columns containing direct personal identifiers.
Anonymised data (after deletion of direct identifiers)
ID Age City Diagnosis
101 35 Sigulda Hypertension
102 28 Ape Diabetes
103 40 Dobele Migraine
104 32 Suntaži Multiple sclerosis
Sometimes, even after the deletion of direct identifiers, it is possible to identify a person, for example by other unique features or a combination of features. In such cases, the use of another anonymisation method or the deletion of the unique value of a variable should be considered.

Data deletion

Data deletion is one of the simplest anonymisation methods. It is the deletion of sensitive data from a dataset. This can be done in cases where the sensitive data is no longer needed, does not need to be updated and will no longer be used in data analysis. This method can be conveniently used for direct identifiers, while other anonymisation methods may need to be used for indirect identifiers.

Example

The original data contains direct identifiers (name, surname, personal identification number, telephone number and student ID number) and indirect identifiers (age, city, diagnosis). In this example, only the direct identifiers can be deleted as they will not be used in the data analysis; otherwise, almost the whole dataset would have to be deleted.
Original data
ID Name, surname Age City Personal code Phone Diagnosis
101 John Berzins 35 Sigulda 120390-***** 29123456 Hypertension
102 Anna Kalnina 28 Ape 040795-***** 26789012 Diabetes
103 Peteris Ozols 40 Dobele 3150882-***** 22334455 Migraine
104 Laura Linden 32 Suntaži 080188-***** 26543218 Multiple sclerosis
Deletes columns containing direct personal identifiers.
Anonymised data (after deletion of direct identifiers)
ID Age City Diagnosis
101 35 Sigulda Hypertension
102 28 Ape Diabetes
103 40 Dobele Migraine
104 32 Suntaži Multiple sclerosis
Sometimes, even after the deletion of direct identifiers, it is possible to identify a person, for example by other unique features or a combination of features. In such cases, the use of another anonymisation method or the deletion of the unique value of a variable should be considered.