August 7, 2015

Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates

By J.R. Lockwood and Daniel McCaffrey


Regression, weighting and related approaches to estimating a population mean from a sample with nonrandom missing data often rely on the assumption that conditional on covariates, observed samples can be treated as random. Standard methods using this assumption generally will fail to yield consistent estimators when covariates are measured with error. We review approaches to consistent estimation of a population mean of an incompletely observed variable using error-prone covariates, noting difficulties with applying these methods. We consider the application of Simulation-Extrapolation (SIMEX) as a simple and effective alternative. We provide technical conditions under which SIMEX will lead to a consistent estimator of a population mean and argue why it may function well in common settings. We use a simulation study to demonstrate its potential for removing nearly all of the bias in regression, weighting and doubly robust estimators for a population mean while maintaining precision competitive with what would be obtained without measurement error. We also discuss and evaluate options for estimating the standard error of the SIMEX mean estimator. Finally, we present an empirical example of estimating middle school effects on student achievement.