Coppock, Alexander, Thomas J. Leeper, and Kevin J. Mullinix. “The Generalizability of Heterogeneous Treatment Effect Estimates Across Samples” Working Paper.


Abstract: The extent to which studies conducted with non-representative convenience samples are generalizable to broader populations depends critically on the level of treatment effect heterogeneity. Recent inquiries (e.g., Mullinix et al. 2015; Coppock forthcoming} have found a strong correspondence between average treatment effects estimated in nationally-representative experiments and in replication studies conducted with convenience samples. In this paper, we consider three possible explanations: low levels of effect heterogeneity, high levels of effect heterogeneity that are unrelated to selection into the convenience sample, or just good luck. We reanalyze 34 original-replication study pairs (encompassing over 100,000 individual survey responses) to assess the extent to which a model of heterogeneity in treatment response estimated on the original dataset predicts the heterogeneity in the replication experiment, and vice-versa. We isolate specific situations in which heterogeneous effect findings do and do not replicate in non-representative samples.


Figure 1 from paper, showing correlations of conditional average treatment effect estimates across original and MTurk replications.

Coppock, Leeper, and Mullinix 2017 Figure 1

