Sampling - Concepts and methods
Sample design
Practical guidelines are available from United Nations Statistics Division (2005), Household Sample Surveys in Developing and Transition Countries. See in particular the following chapters:
Chapter II: Overview of sample design issues for household surveys in developing and transition countries, by Ibrahim S. Yansaneh
Abstract: The chapter discusses the key issues involved in the design of national samples, primarily for household surveys, in developing and transition countries. It covers such topics as sampling frames, sample size, stratified multistage sampling, domain estimation, and survey analysis.
Chapter V: Design of master sampling frames and master samples for household surveys in developing countries, by Hans Pettersson
Abstract: The chapter addresses issues concerning the design of master sampling frames and master samples. The introduction is followed by several sections. Section B gives a brief account of the reasons for developing and utilizing master sampling frames and master samples; section C contains a discussion of the main issues in the design of a master sampling frame; and section D covers master samples and addresses the important decisions to be taken during the design stage (choice of PSUs, number of sampling stages, stratification, allocation of sample over strata, etc.).
Chapter VI: Estimating components of design effects for use in sample design, by Graham Kalton, J. Michael Brick and Thanh Lê
Abstract: The design effect - the ratio of the variance of a statistic with a complex sample design to the variance of that statistic with a simple random sample or an unrestricted sample of the same size - is a valuable tool for sample design. However, a design effect found in one survey should not be automatically adopted for use in the design of another survey. A design effect represents the combined effect of a number of components such as stratification, clustering, unequal selection probabilities, and weighting adjustments for non-response and non-coverage. Rather than simply importing an overall design effect from a previous survey, careful consideration should be given to the various components involved. The present chapter reviews the design effects due to individual components, and then describes models that may be used to combine these component design effects into an overall design effect. From the components, the sample designer can construct estimates of overall design effects for alternative sample designs and then use these estimates to guide the choice of an efficient sample design for the survey being planned.
Chapter VII: Analysis of design effects for surveys in developing countries, by Hans Pettersson and Pedro Luis do Nascimento Silva
Abstract: The chapter presents design effects for 11 household surveys from 7 countries and, for 3 surveys that are rather similar in design, compares design effects and rates of homogeneity (roh) for estimates of household consumption and possession of durables. It concludes with a discussion of the portability of estimates of roh across surveys.
- Chapter XXIV: Survey design and sample design in household budget surveys, by Hans Pettersson
Abstract: The chapter addresses some issues on survey design and sample design for household budget surveys. The focus is on surveys in developing countries. Problems of measuring consumption and income are discussed in some detail in section B. Section C contains a discussion on some crucial sample design issues, for example, stratification, and sample allocation in space (geographical) and in time (over the full season). Section D provides a description of the Lao Expenditure and Consumption Survey 1997/98 (LECS-2). In section E, some of the experiences from LECS-2 are discussed.
Computation of sampling errors
United Nations Statistics Division (2005)
Household Sample Surveys in Developing and Transition CountriesChapter XXI: Sampling error estimation for survey data, by Donna Brogan
Abstract: Complex sample survey designs deviate from simple random sampling, including aspects such as unequal probability sampling, multistage sampling and stratification. Weighted analyses are necessary for unbiased (or nearly unbiased) estimates of population parameters. Variance estimation for estimators depends upon the sampling plan specifics and requires approximate methods, generally Taylor series linearization or replication techniques. Standard statistical software packages generally cannot be used to analyse sample survey data since they typically assume simple random sampling of elements. These packages yield biased point estimates of population parameters (in an unweighted analysis) and/or underestimation of standard errors for point estimates. Using the sampling weight variable with standard packages yields appropriate point estimates of population parameters. However, estimated standard errors usually are still incorrect because the variance estimation procedure typically does not take into account the clustering and/or stratification of the sampling plan. The present chapter gives an overview of eight software packages with capability for sample survey data analysis, including approximate cost, variance estimation methods, analysis options, user interface, and advantages/disadvantages. Four of the packages are free, hence possibly of interest to developing countries that have a limited budget for software acquisition.
A complex sample survey data set from Burundi illustrates that incorrect analyses are obtained from standard statistical software. Annotated descriptive analyses with the Burundi survey for five of the eight reviewed packages (Stata, SAS, SUDAAN, WesVar and Epi-Info) show how to use these packages. Finally, numerical results from the five software packages are compared for common analytical objectives with the Burundi survey data. All five packages give equivalent variance estimation results whether Taylor series linearization or balanced repeated replication (BRR) is used.
Software
The following software can be used to compute sampling errors:
- CENVAR, a free variance calculation package which produces reliability measures for estimates from stratified multistage sample surveys or simpler survey designs. CENVAR is a module of the Integrated Microcomputer Processing System (IMPS) developed by the US Bureau of Census.
- Epi-Info, by the Centers for Disease Control and Prevention
- SAS
- SPSS, in particular its Complex Samples module
- Stata, which provides special survey analysis commands
- SUDAAN, by RTI International
- WesVar, developed and distributed by Westat
