Ali Rafei PhD candidate, Survey Methodology Program, University of Michigan
Robust and Efficient Methods of Inference for Non-probability Samples Big Data can be viewed as large-scale non-probability samples. Not only is the selection mechanism often unknown and beyond the control of researchers, but larger data volumes increase the relative contribution of selection bias to squared or absolute error. The existing approaches of bias adjustment rely heavily on the correct specification of the underlying models. To weaken this assumption, Chen et al (2019) extend the idea of double robustness based on augmented inverse propensity weighting to a non-probability sample setting. However, their proposed propensity model, which takes a modified pseudo-maximum likelihood approach to account for the sampling weights in the reference survey, is limited to the parametric models. Under a Bayesian framework, handling these weights is even a bigger hurdle. To further protect against model misspecification, we further expand this idea such that more flexible non-parametric methods as well as Bayesian models are allowed for predicting the pseudo-weights as well as the outcome variable. We assess the asymptotic properties of our proposed method under Generalized Linear Models. When the true underlying models are unknown, we employ Bayesian additive regression trees (BART) which not only capture non-linear associations automatically but also permit direct estimation of the variance through its posterior predictive draws. Considering the 2017 National Household Travel Survey as a benchmark, we apply our method to the sensor-based naturalistic driving data from the second phase of the Strategic Highway Research Program.