Long-term preservation of data and metadata
The IHSN has contracted the Inter-university Consortium for Political and Social Research (ICPSR, University of Michigan) to develop guidelines for statistical agencies interested in establishing preservation good practices.
See also the on-line tutorial available at the ICPSR website, from which much of the content below was taken.
What do we mean by data preservation?
Microdata preservation refers to the management of digital data and related metadata over time to guarantee their long term usability. It requires the establishment and implementation of a preservation policy and procedures to ensure that data and all related metadata are preserved against:
- Hardware or software obsolescence
- Media failure, and
- Other physical threats.
Unlike the preservation of information on paper, the preservation of digital information demands constant attention. In most developing countries, statistical agencies and other data producers pay insufficient attention to the issue, and few have formal preservation policies and satisfactory practices. Common issues include:
- Loss of data and metadata
- Data available, but on unreadable formats/media
- Data available, but undocumented
- Documentation only available in hard copy
- Multiple versions of datasets available, with no “versioning” information
The solution would consist of establishing formal preservation policies and procedures (no ad-hoc action !) to:
- Back up data regularly, and store data in different locations
- Ensure suitable data storage
- Refresh media periodically (copy digital information from one medium to another)
- Migrate data periodically (convert data from one technology to another, whether hardware or software)
- Enforce security and controlled access to the data
- Develop a disaster recovery plan
Why is it so important ?
Reasons to preserve your survey and census data and metadata (and their readability) include:
- Allow users in the near and distant future to exploit them
- Allow replication or data collection and analysis
- Build time series of data
- Build institutional memory
- Satisfy a legal obligation
What are the main issues?
Problem 1 - Hardware obsolescence
Storage medium are rapidly superseded by smaller, denser, faster media. The device needed to read an “old” medium may no longer be manufactured.
Problem 2 - Software obsolescence
A file format may be superseded by newer versions and no longer be supported. Various factors contribute to software obsolescence:
- New computing hardware opens the door to new and improved software. Software upgrades fail to support legacy files, leading to software and file format obsolescence
- Software supporting the format fails in the marketplace or is bought by a competitor and withdrawn
- The format is superseded by another
- The format "take up" is low or industry fails to create compatible software
- The format is no longer compatible with the current environment
The most vulnerable files are the files in proprietary, closed specifications (example: SAS). Files in proprietary, open specifications have a lower risk because the specification has been publicly released, allowing others to produce software that can read them (example: PDF). The less vulnerable files are those in non-proprietary, open specifications formats. In terms of guaranteed long-term availability, published specifications produced by international standards bodies are the safest (examples: XML, ASCII, JPEG).
Problem 3 - Physical threats
Physical damage can occur to hardware and media due to:
- Material instability
- Improper storage environment (temperature, humidity, light, dust)
- Overuse (mainly for physical contact media)
- Natural disaster (fire, flood, earthquake)
- Infrastructure failure (plumbing, electrical, climate control)
- Inadequate hardware maintenance
- Human error (including improper handling)
- Sabotage (theft, vandalism)
References
Two key documents have emerged from the digital preservation community.
- Reference Model for an Open Archival Information System (OAIS)
- Trusted Digital Repositories: Attributes and Responsibilities
See also:
- UK Data Archive Preservation Policy
- UK Office of National Statistics, National Statistics Code of Practice - Protocol on Data Management, Documentation and Preservation

- Guidance on Data Management, by the Rural Economy and Land Use Programme Data Support Service (RELU-DSS, United Kingdom). This document provides information and examples of good practice applying more broadly to all kinds of research data.
