Generalisation of data – DataverseLV

Generalisation of datanauris_b23k2026-02-16T13:12:13+02:00

Generalisation of data

Generalisation of data: generalisation) replaces specific values with broader categories, e.g. date of birth → age group; address → city.

Example

Look again at the data you have previously collected, where direct identifiers have already been deleted. Although the name and surname are not included together, it is still possible to identify individuals in the case of rare diseases or small localities. In this example, where a patient with a rare disease (multiple sclerosis) of a certain age lives in a small locality, there is a very high risk of identity reconstruction (de-anonymisation). Using data generalisation, all other columns containing implicit identifiers of the person (e.g. age, city, diagnosis) can be transformed.

Original data

ID	Age	City	Diagnosis
101	35	Sigulda	Hypertension
102	28	Ape	Diabetes
103	40	Dobele	Migraine
104	32	Suntaži	Multiple sclerosis

Anonymised data after generalisation

ID	Age group	Region	Disease group
101	30-39	Sigulda region	Diseases of the circulatory system
102	20-29	Smiltene region	Diseases of the circulatory system
103	40-49	Dobele region	Diseases of the nervous system
104	32-39	Ogre region	Diseases of the nervous system

The resulting data no longer contain implicit personal identifiers, but a general description of these identifiers, e.g. age group instead of age, city instead of county and disease group instead of disease group.

This can significantly reduce the risk of possible re-identification (de-anonymisation) of research participants, but Before using a generalisation method, careful consideration should be given to whether the generalised data will allow the intended data analysis.

Microaggregation

Generalisation of data

Generalisation of data: generalisation) replaces specific values with broader categories, e.g. date of birth → age group; address → city.

Example

Look again at the data you have previously collected, where direct identifiers have already been deleted. Although the name and surname are not included together, it is still possible to identify individuals in the case of rare diseases or small localities. In this example, where a patient with a rare disease (multiple sclerosis) of a certain age lives in a small locality, there is a very high risk of identity reconstruction (de-anonymisation). Using data generalisation, all other columns containing implicit identifiers of the person (e.g. age, city, diagnosis) can be transformed.

Original data

ID	Age	City	Diagnosis
101	35	Sigulda	Hypertension
102	28	Ape	Diabetes
103	40	Dobele	Migraine
104	32	Suntaži	Multiple sclerosis

Anonymised data after generalisation

ID	Age group	Region	Disease group
101	30-39	Sigulda region	Diseases of the circulatory system
102	20-29	Smiltene region	Diseases of the circulatory system
103	40-49	Dobele region	Diseases of the nervous system
104	32-39	Ogre region	Diseases of the nervous system

The resulting data no longer contain implicit personal identifiers, but a general description of these identifiers, e.g. age group instead of age, city instead of county and disease group instead of disease group.

This can significantly reduce the risk of possible re-identification (de-anonymisation) of research participants, but Before using a generalisation method, careful consideration should be given to whether the generalised data will allow the intended data analysis.

Microaggregation

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Accessibility statement

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Accessibility statement

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Privātuma politika

Data Deposit Terms

Accessibility statement