Designing RCTs and Observational Studies to Account for Missing Data not Missing at Random
Mark McGovern is a Lecturer in Economics at Queen’s Management School, Queen’s University Belfast, and the UKCRC Centre of Excellence for Public Health (Northern Ireland). Prior to joining Queen’s in September 2015, he was a Program on the Global Demography of Aging Postdoctoral Fellow at Harvard University. He received his PhD in economics from University College Dublin in 2013. His main research interests are in health and development, including a variety of topics in ageing, HIV, and maternal/child health. His work has involved the application of causal inference methods for observational data to research questions in these areas, such as evaluating the impact of early life conditions on child and adult outcomes. His work has been featured in journals such as Economics and Human Biology, Journal of Population Economics, Journal of Health Economics, Journal of the Economics of Ageing, Journal of the International AIDS Society, Epidemiology, and American Journal of Epidemiology. Most recently, he has been working on developing methods for dealing with non-ignorable missing data.
Missing data is a common feature of both survey data and RCTs, which has the potential to greatly impact on the policy recommendations we derive from empirical studies. Non-response can lead to biased estimates if the characteristics of respondents systematically differ from those who decline to participate. In practice, if any adjustments for missing data are made, they tend to be based on either multiple imputation or inverse probability weighting. Conventional methods such as these all rely on a key assumption: missing data must be missing at random, or missing at random conditional on observed covariates. This is a strong and generally untestable assumption which is unrealistic in many settings, especially where some respondents have an incentive not to participate. In this presentation, I show how an alternative approach, Heckman-type selection models, can be used for dealing with missing data. This method can provide unbiased estimates even when the assumption of missing at random does not hold, and respondents systematically opt out of survey participation on the basis of unobserved confounders. Using examples from research on HIV, I illustrate the consequences of imposing an unrealistic missing at random assumption on survey data. I conclude by discussing how to design RCTs and observational studies to facilitate the implementation of this selection model approach.