Generalisation of data

Generalisation of data: generalisation) replaces specific values with broader categories, e.g. date of birth → age group; address → city.

Example

Look again at the data you have previously collected, where direct identifiers have already been deleted. Although the name and surname are not included together, it is still possible to identify individuals in the case of rare diseases or small localities. In this example, where a patient with a rare disease (multiple sclerosis) of a certain age lives in a small locality, there is a very high risk of identity reconstruction (de-anonymisation). Using data generalisation, all other columns containing implicit identifiers of the person (e.g. age, city, diagnosis) can be transformed.
Original data
ID Age City Diagnosis
101 35 Sigulda Hypertension
102 28 Ape Diabetes
103 40 Dobele Migraine
104 32 Suntaži Multiple sclerosis
Anonymised data after generalisation
ID Age group Region Disease group
101 30-39 Sigulda region Diseases of the circulatory system
102 20-29 Smiltene region Diseases of the circulatory system
103 40-49 Dobele region Diseases of the nervous system
104 32-39 Ogre region Diseases of the nervous system
The resulting data no longer contain implicit personal identifiers, but a general description of these identifiers, e.g. age group instead of age, city instead of county and disease group instead of disease group.
This can significantly reduce the risk of possible re-identification (de-anonymisation) of research participants, but Before using a generalisation method, careful consideration should be given to whether the generalised data will allow the intended data analysis.

Generalisation of data

Generalisation of data: generalisation) replaces specific values with broader categories, e.g. date of birth → age group; address → city.

Example

Look again at the data you have previously collected, where direct identifiers have already been deleted. Although the name and surname are not included together, it is still possible to identify individuals in the case of rare diseases or small localities. In this example, where a patient with a rare disease (multiple sclerosis) of a certain age lives in a small locality, there is a very high risk of identity reconstruction (de-anonymisation). Using data generalisation, all other columns containing implicit identifiers of the person (e.g. age, city, diagnosis) can be transformed.
Original data
ID Age City Diagnosis
101 35 Sigulda Hypertension
102 28 Ape Diabetes
103 40 Dobele Migraine
104 32 Suntaži Multiple sclerosis
Anonymised data after generalisation
ID Age group Region Disease group
101 30-39 Sigulda region Diseases of the circulatory system
102 20-29 Smiltene region Diseases of the circulatory system
103 40-49 Dobele region Diseases of the nervous system
104 32-39 Ogre region Diseases of the nervous system
The resulting data no longer contain implicit personal identifiers, but a general description of these identifiers, e.g. age group instead of age, city instead of county and disease group instead of disease group.
This can significantly reduce the risk of possible re-identification (de-anonymisation) of research participants, but Before using a generalisation method, careful consideration should be given to whether the generalised data will allow the intended data analysis.