“Design and Analysis of Two-Phase Studies, With Applications to Genetic Association Studies”
Abstract: In modern epidemiological and clinical studies, the covariates of interest may involve genome sequencing, biomarker assay, or medical imaging and thus are prohibitively expensive to measure on a large number of subjects. A cost-effective strategy is the two-phase design, under which the outcome variable and inexpensive covariates are observed for all subjects during the first phase and the first-phase information is used to select subjects for measurements of expensive covariates during the second phase. For example, subjects with extreme values of quantitative traits (or their residuals) were selected for whole-exome sequencing in the National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP). Because the sample selection depends on the outcome variable, two-phase studies cannot be properly analyzed by standard statistical methods. Also, it is unclear which subjects should be selected in the second phase. In this talk, we present a semiparametric regression approach to make valid and efficient statistical inference for two-phase studies. In addition, we describe optimal two-phase designs, which can be substantially more efficient than existing designs. Finally, we demonstrate the usefulness of the proposed methods through simulation studies and the aforementioned NHLBI ESP.
For inquiries contact Chengjie Xiong.