Back to Top

Outliers in consumption surveys

Poverty and inequality are measured based on household consumption or income expenditure data, or in some cases on household income data. The same survey datasets are used for multiple other purposes, such as analyzing and monitoring changes in household consumption patterns.

These survey data are complex, and implemented by national statistical agencies constrained by limited financial and technical capacity. When analyzing data, particular attention should thus be paid to quality of the data, and techniques must be implemented not only to identify issues, but also to solve them to the extent possible.

The issue of outliers detection and fixing in consumption or income data has received insufficient attention, in part due to a lack of clear guidance or tools. Under the umbrella of the International Household Survey Network (IHSN) work program, the World Bank Development Data Group is implementing a research project consisting of a large-scale assessment of the issue.

Project status: Open
Sponsor(s): DFID Trust Fund No TF011722 administered by the World Bank, Development Data Group (WB-DECDG), and World Bank budget
Implemented by: Technology University of Vienna, Austria
Type of output: The output of the project will consist of:
  • A detailed assessment of outliers detection algorithms (long report, with full findings) .
  • A shorter document, providing practical guidelines for detecting and fixing outliers in expenditure survey datasets
  • Programs and scripts (R package or scripts based on existing packages, and Stata scripts) used for outliers detection and imputation.


Long version of the report

To assess the relevance and impact of various outliers detection and imputation methods on household expenditure data in sample surveys, the World Bank and the IHSN commissioned a comprehensive review of existing algorithms, including an assessment of their implementation on actual survey datasets. This report presents the main findings of this work. A shorter version of this report, with practical guidelines and instructions for implementation in R and Stata, will also be produced.
Download (1.1 Mb)

Technical guidelines and scripts

Under preparation