STA 290 Seminar: Guillaume Basse

STA 290 Seminar Series

DATE:           Friday November 17th, 10:30am

LOCATION:   MSB 1147, Colloquium Room

SPEAKER:      Guillaume Basse

TITLE:           “Model-assisted design of experiments in the presence of network correlated outcomes”

ABSTRACT: Conventional wisdom when designing randomized experiments has been famously summarized by the British statistician George Box, who said: ``block what you can, randomize what you cannot''. The rationale behind this assertion is that units with similar covariates are likely to exhibit similar responses, thus ensuring covariate balance between treatment arms can be expected to improve the efficiency of standard estimators. However, in many modern experimental settings, it is not clear what quantities should be ``blocked'' on. These modern settings include, for instance, A/B tests carried out by IT companies and social media platforms, as well as large field experiments designed by economists and social scientists in the developing world. Crucially, both of these settings often involve networks that inform the notion of similarity between units, at least in part, via a phenomenon known as homophily that induces correlations among the outcomes. In this talk, we introduce the concept of ``model-assisted design of experiments'' as a way to leverage models to construct restricted randomization schemes for allocating treatment in experiments. In the specific setting we consider, we posit a working-model capturing some aspects of homophily, and compute the conditional mean square error of the average treatment effect estimator conditional on each assignment. The induced randomization distribution then assigns equal probability to all assignments with a small conditional mean square error. An analytical decomposition of this conditional mean square error makes explicit a new notion of``blocking''. We present some early theoretical results and robustness guarantees in the general case; in particular, model-assisted designs reduce the variance of the difference in means estimator to the extent that the working-model captures useful features of reality, but will maintain unbiasedness with respect to the restricted randomization distribution (i.e., the design) regardless of the validity of the working model.