Microdata anonymization

Statistical agencies and other data producers are increasingly publishing microdata obtained from sample surveys, censuses, and administrative data collection systems. The dissemination of microdata is made necessary by a high demand from the research community, a push for transparency, and sometimes by legal or contractual obligations. This must be done in such a way that the confidentiality of the information provided by respondents is preserved.

In this section we present:

The main principles associated with microdata anonymization
Various techniques used for measuring the disclosure risk
Methods available for reducing the disclosure risk
Methods for assessing the resulting information loss

Links to available tools are also provided, as well as a compilation of practices.

Anonymization is typically required for the production of public use files, and to a lesser extent, for generating licensed files. But anonymization is only one of many solutions to minimize the risk of disclosure when distributing microdata. Other legal and organizational measures contribute to this endeavor as well. For datasets provided to selected bona fide users, the legal agreement may include a higher level of security than anonymization alone (see the section on formulating a data dissemination policy).

Three guides have been produced:

Theory Guide - which provides an overview of common methods as well as of the SDC process,
Practice Guide - which describes how to apply methods using the command line interface for the R package sdcMicro and,
Manual for sdcApp - a graphic user interface for sdcMicro for users not comfortable using R from the command line.

Statistical Disclosure Control for Microdata: Theory

Thijs Benschop and Matthew Welch - 11/12/2019

This guide provides and introduction to the theory of Statistical Disclosure Control (SDC) for microdata. It includes an overview of the most commonly applied methods in SDC, a step-by-step overview of the complete SDC process and many examples from practice in National Statistics Offices (NSOs).
For guidance on the technical implementation of the theory mentioned in the guide, please refer to our guides:
- Statistical Disclosure Control for Microdata: A Practice Guide for guidance on the application of methods and on using sdcMicro from the command-line
- sdcApp manual for guidance on the application of methods and on using the GUI sdcApp available for sdcMicro

Download

Statistical Disclosure Control for Microdata: A Practice Guide

Thijs Benschop, Matthew Welch - June 2016

Releasing data in a safe way is required to protect the integrity of the statistical system, by ensuring agencies honor their commitment to respondents to protect their identity. Agencies do not widely share, in substantial detail, their knowledge and experience using SDC and the processes for creating safe data with other agencies. This makes it difficult for agencies new to the process to implement solutions. We consolidated knowledge from literature as well as from our own experience to inform our discussion of the processes and methods presented in this guide. This guide focuses on the implementation of methods and uses the free R based package sdcMicro for its examples. If you are interested in reading in detail about the theory behind the methods used, we suggest reading our accompanying guide: Statistical Disclosure Control for Microdata: Theory.

Download

Statistical Disclosure Control for Microdata: A Practice Guide - Case Study Data and R Script

June 2016

Download (1 MB)

sdcApp Reference Manual

Thijs Benschop, Matthew Welch - 2019-11-12

This is documentation and guidance for using sdcApp, a graphic user interface for the sdcMicro R package. sdcMicro provides tools for Statistical Disclosure Control (SDC) for microdata, also known as microdata anonymization. For an overview of the theory of SDC for microdata we suggest reading: Statistical Disclosure Control for Microdata: A Theory Guide.

Download

As well as an IHSN Working paper:

Introduction to Statistical Disclosure Control (SDC)

Matthias Templ, Bernhard Meindl, Alexander Kowarik and Shuang Chen - August 2014

This guide, Introduction to Statistical Disclosure Control (SDC), discusses common SDC methods for microdata obtained from sample surveys, censuses and administrative sources.

Download

Related Resources

Documents

Anonymisation: managing data protection risk code of practice

Download

Author(s)

UK Information Commissioner's Office

Description

The code explains the issues surrounding the anonymisation of personal data, and the disclosure of data once it has been anonymised. It explains the relevant legal concepts and tests in the UK Data Protection Act 1998 (DPA). The code provides good practice advice that will be relevant to all organisations that need to convert personal data into a form in which individuals are no longer identifiable.

Date

November 2012

URL

http://ico.org.uk/for_organisations/data_protection/topic_guides/~/media/documents/library/Data_Protection/Practical_application/anonymisation-codev2.pdf

Handbook on Statistical Disclosure Control (Version 1.2)

Download

Author(s)

ESSNet SDC

Date

January 2010

URL

http://neon.vb.cbs.nl/casc/SDC_Handbook.pdf

Introduction to Statistical Disclosure Control (SDC)

(752.9 KB)

Author(s)

Matthias Templ, Bernhard Meindl, Alexander Kowarik and Shuang Chen

Description

This guide, Introduction to Statistical Disclosure Control (SDC), discusses common SDC methods for microdata obtained from sample surveys, censuses and administrative sources.

Date

August 2014

URL

https://ihsn.org/sites/default/files/resources/ihsn-working-paper-007-Oct27.pdf

Managing Statistical Confidentiality and Microdata Access - Principles and guidelines of Good Practice

Download

Author(s)

Conference of European Statisticians (CES) and United Nations Economic Commission for Europe (UNECE)

Description

These guidelines have been prepared at the request of the Conference of European Statisticians (CES) by a task force chaired by Dennis Trewin, the Australian statistician. The guidelines and core principles of confidentiality and microdata access were adopted by the CES plenary session in June 2006 and the CES Bureau in October 2006.

Date

2007

URL

http://www.unece.org/fileadmin/DAM/stats/publications/Managing.statistical.confidentiality.and.microdata.access.pdf

National Statistics Code of Practice - Protocol on Data Access and Confidentiality (UK)

Download

Author(s)

UK National Statistics

Description

This protocol sets out how the National Statistician, departmental Heads of Profession for Statistics, Chief Statisticians in the devolved administrations and, with their authority, other members of the Government Statistical Service will meet their commitment to guarantee to protect the confidentiality of statistical data within their care. Statistical data include data collected specifically through censuses and surveys for statistical purposes, as well as data derived from administrative systems where those data then form part of a statistical product. The Protocol establishes policies for protecting confidentiality when processing statistical data and publishing outputs. It sets out the conditions and procedures which govern access to data, including access to data for research purposes, together with appropriate action in the event of unauthorised data disclosure. It covers all statistical data that are required to be kept confidential, including those collected from persons, households, businesses and other organisations, whether from surveys, censuses or administrative sources.

Date

2004

URL

http://www.ons.gov.uk/ons/guide-method/the-national-statistics-standard/code-of-practice/protocols/data-access-and-confidentiality.pdf

sdcApp Reference Manual

Download

Author(s)

Thijs Benschop, Matthew Welch

Description

Date

2019-11-12

URL

https://sdcappdocs.readthedocs.io/en/latest/

Statistical Disclosure Control for Microdata: A Practice Guide

Download

Author(s)

Thijs Benschop, Matthew Welch

Description

Date

June 2016

URL

https://sdcpractice.readthedocs.io/en/latest/

Statistical Disclosure Control for Microdata: A Practice Guide - Case Study Data and R Script

(856.35 KB)

Date

June 2016

URL

https://ihsn.org/sites/default/files/resources/case_studies_code_and_data.zip

File size

1 MB

Statistical Disclosure Control for Microdata: Theory

Download

Author(s)

Thijs Benschop and Matthew Welch

Description

For guidance on the technical implementation of the theory mentioned in the guide, please refer to our guides:

- Statistical Disclosure Control for Microdata: A Practice Guide for guidance on the application of methods and on using sdcMicro from the command-line
- sdcApp manual for guidance on the application of methods and on using the GUI sdcApp available for sdcMicro

Date

11/12/2019

URL

https://sdctheory.readthedocs.io/en/latest/

Tools

Package SdcMicro

Download

Author(s)

Matthias Templ, Alexander Kowarik, Bernhard Meindl, Bernd Prantner

Description

Data from statistical agencies and other institutions are mostly conﬁdential. This package can be used for the generation of anonymized (micro)data, i.e. for the generation of public- and scientiﬁc-use ﬁles. The package sdcMicroGUI includes a graphical user interface for the methods in this package.

URL

http://cran.r-project.org/web/packages/sdcMicro/index.html

Search form

Microdata anonymization

Statistical Disclosure Control for Microdata: Theory

Statistical Disclosure Control for Microdata: A Practice Guide

Statistical Disclosure Control for Microdata: A Practice Guide - Case Study Data and R Script

sdcApp Reference Manual

Introduction to Statistical Disclosure Control (SDC)

Related Resources

Documents

Tools

Guidelines