Michael Elliott is professor of biostatistics at the
University of Michigan School of Public Health and research professor of survey
methodology at the Survey Research Center at the Institute for Social
Research. He has been at Michigan since 2005, where he returned after
serving as an assistant professor at the Department of Biostatistics and
Epidemiology at the University of Pennsylvania from 2000-2005.
Combining Probability Non-probability
Samples
Although
probability sample designs remain a “gold standard” in survey research, demand
for use of non-probability samples is increasing, due to, among other reasons,
rising costs and falling response rates in probability samples and the
availability of “big data” from administrative databases, social media users,
and other sources. Design-based
inference, in which the distribution for inference is generated by the random
mechanism used by the sampler, cannot be used for non-probability samples. If
probability and non-probability samples are available that target the same
population, the probability sample can be used to account for possible
selection bias if there are sufficient overlapping covariates even if the
outcome is not available in the probability sample. One approach is “quasi-randomization” in
which pseudo-inclusion probabilities are estimated based on covariates
available for samples and nonsample units. An extension of this uses a model to
predict values for the outcome in the probability sample, yielding a “doubly
robust” estimator that consistent estimates target population quantities if
either the pseudo-inclusion probabilities or outcome model is correct. I will
overview these approaches, with a focus on using Bayesian additive regression tree
to reduce model misspecification, and apply results to “naturalistic” driving
studies that use volunteer samples to follow long-term driving behavior.