Statistical agencies and other data producers are increasingly publishing microdata obtained from sample surveys, censuses, and administrative data collection systems. The dissemination of microdata is made necessary by a high demand from the research community, a push for transparency, and sometimes by legal or contractual obligations. This must be done in such a way that the confidentiality of the information provided by respondents is preserved.
In this section we present:
- The main principles associated with microdata anonymization
- Various techniques used for measuring the disclosure risk
- Methods available for reducing the disclosure risk
- Methods for assessing the resulting information loss
Anonymization is typically required for the production of public use files, and to a lesser extent, for generating licensed files. But anonymization is only one of many solutions to minimize the risk of disclosure when distributing microdata. Other legal and organizational measures contribute to this endeavor as well. For datasets provided to selected bona fide users, the legal agreement may include a higher level of security than anonymization alone (see the section on formulating a data dissemination policy).
A practice guide has been produced, which relies on the scdMicro open source R package.