Generalisation of data
Generalisation of data: generalisation) replaces specific values with broader categories, e.g. date of birth → age group; address → city.
Example
Look again at the data you have previously collected, where direct identifiers have already been deleted. Although the name and surname are not included together, it is still possible to identify individuals in the case of rare diseases or small localities. In this example, where a patient with a rare disease (multiple sclerosis) of a certain age lives in a small locality, there is a very high risk of identity reconstruction (de-anonymisation). Using data generalisation, all other columns containing implicit identifiers of the person (e.g. age, city, diagnosis) can be transformed.
Original data
| ID |
Age |
City |
Diagnosis |
| 101 |
35 |
Sigulda |
Hypertension |
| 102 |
28 |
Ape |
Diabetes |
| 103 |
40 |
Dobele |
Migraine |
| 104 |
32 |
Suntaži |
Multiple sclerosis |
Anonymised data after generalisation
| ID |
Age group |
Region |
Disease group |
| 101 |
30-39 |
Sigulda region |
Diseases of the circulatory system |
| 102 |
20-29 |
Smiltene region |
Diseases of the circulatory system |
| 103 |
40-49 |
Dobele region |
Diseases of the nervous system |
| 104 |
32-39 |
Ogre region |
Diseases of the nervous system |
The resulting data no longer contain implicit personal identifiers, but a general description of these identifiers, e.g. age group instead of age, city instead of county and disease group instead of disease group.
This can significantly reduce the risk of possible re-identification (de-anonymisation) of research participants, but Before using a generalisation method, careful consideration should be given to whether the generalised data will allow the intended data analysis.