Microaggregation – DataverseLV

Microaggregationnauris_b23k2026-02-16T13:12:12+02:00

Microaggregation

Microaggregation groups similar entries and replaces the original individual values with the group average. This method is most commonly used for numerical parameters, but there are also methods that allow micro-aggregation to be used for categorical data.

Example

Looking at the table obtained after the data deletion, it can be seen that this time micro-aggregation was used to transform the data of the respective columns. In this example, micro-aggregation has been applied to the parameter "age".

Original data

ID	Age	City	Diagnosis
101	35	Sigulda	Hypertension
102	28	Ape	Diabetes
103	40	Dobele	Migraine
104	32	Suntaži	Multiple sclerosis
105	22	Riga	Asthma
106	44	Liepaja	Hypertension

Age grouped in dataset 20–29, 30-39 and 40-49, and an average value is calculated for each group, replacing the original entry.

Anonymised dataset by age micro-aggregation

ID	Age	City	Diagnosis
101	34	Sigulda	Hypertension
102	25	Ape	Diabetes
103	42	Dobele	Migraine
104	34	Suntaži	Multiple sclerosis
105	25	Riga	Asthma
106	42	Liepaja	Hypertension

Of course, also before micro-aggregating the data, careful consideration must be given to whether the information processed in this way will allow the intended data analysis to be carried out.

Generalisation of data

Microaggregation

Microaggregation groups similar entries and replaces the original individual values with the group average. This method is most commonly used for numerical parameters, but there are also methods that allow micro-aggregation to be used for categorical data.

Example

Looking at the table obtained after the data deletion, it can be seen that this time micro-aggregation was used to transform the data of the respective columns. In this example, micro-aggregation has been applied to the parameter "age".

Original data

ID	Age	City	Diagnosis
101	35	Sigulda	Hypertension
102	28	Ape	Diabetes
103	40	Dobele	Migraine
104	32	Suntaži	Multiple sclerosis
105	22	Riga	Asthma
106	44	Liepaja	Hypertension

Age grouped in dataset 20–29, 30-39 and 40-49, and an average value is calculated for each group, replacing the original entry.

Anonymised dataset by age micro-aggregation

ID	Age	City	Diagnosis
101	34	Sigulda	Hypertension
102	25	Ape	Diabetes
103	42	Dobele	Migraine
104	34	Suntaži	Multiple sclerosis
105	25	Riga	Asthma
106	42	Liepaja	Hypertension

Of course, also before micro-aggregating the data, careful consideration must be given to whether the information processed in this way will allow the intended data analysis to be carried out.

Generalisation of data

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Data Deposit Terms

Funding

The website was developed within the framework of the Project No 2.1.3.1.i.0/2/23/I/CFLA/002 “Support for implementing Open Science, developing solutions for shared use of research data, and participation in the EOSC” with financial support from the European Union Recovery and Resilience Facility and the Latvian state.

Asset 3

Privātuma politika

Data Deposit Terms