A comparison of multiple imputation methods for the analysis of survival data with outcome related missing covariate values



Missing covariates, Cox regression, multiple imputation, simulation study, censoring-ignorable MAR


The Cox proportional hazards model is commonly used in medical research for investigating the association between the survival time and covariates. However, it is quite common for the analysis to involve missing covariate values. A reasonable assumption is that data are censoring-ignorable MAR in the sense that missingness does not depend on censoring time but may depend on failure time. In this case, a complete cases analysis produce biased regression coefficient estimates. Through a simulation study, we compare three multiple imputation approaches for a missing covariate when missingness is survival time-dependent: (i) the method proposed by White & Royston (2009) that uses the cumulative hazard in an approximation to the imputation model, (ii) the method described by Bartlett et al. (2015) that incorporates the Cox model in the imputation process, and (iii) the CART approach, a method known to deal with skewed distributions, interaction and nonlinear relations. Simulation results show that the method of White & Royston (2009) may produce very biased estimates while the CART approach underestimates the imputation uncertainty resulting in low coverage rates. The method of Bartlett et al. (2015) had the best performance overall, with small finite sample bias and coverage rates close to nominal values. We apply the imputation approaches to a Chagas disease dataset.






Author Biography

José Luiz Padilha da Silva, Federal University of Paraná




Bartlett, Jonathan W, Seaman, Shaun R, White, Ian R, Carpenter, James R, & Initiative*, Alzheimer’s Disease Neuroimaging. 2015. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Statistical methods in medical research, 24(4), 462–487.

Breiman, Leo, Friedman, Jerome H, Olshen, Richard A, & Stone, Charles J. 2017. Classification and regression trees. Routledge.

Carpenter, James, & Kenward, Michael. 2012. Multiple imputation and its application. John Wiley & Sons.

Carpenter, James R., Kenward, Michael G., & Vansteelandt, Stijn. 2006. A comparison of multiple imputation and doubly robust estimation for analyses with missing data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(3), 571–584.

Chen, Ming-Hui, Ibrahim, Joseph G, & Shao, Qi-Man. 2009. Maximum likelihood inference for the Cox regression model with applications to missing covariates. Journal of multivariate analysis, 100(9), 2018–2030.

Collins, Linda M, Schafer, Joseph L, & Kam, Chi-Ming. 2001. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological methods, 6(4), 330.

Cox, David R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–202.

Cox, David R. 1975. Partial likelihood. Biometrika, 62(2), 269–276.

Hsu, Chiu-Hsieh, & Yu, Mandi. 2019. Cox regression analysis with missing covariates via nonparametric multiple imputation. Statistical methods in medical research, 28(6), 1676–1688.

Kalbfleisch, John D, & Prentice, Ross L. 2011. The statistical analysis of failure time data. Vol. 360. John Wiley & Sons.

Little, Roderick JA, & Rubin, Donald B. 2019. Statistical analysis with missing data. Vol. 793. John Wiley & Sons.

Nunes, Maria Carmo Pereira, Beaton, Andrea, Acquatella, Harry, Bern, Caryn, Bolger, Ann F, Echeverria, Luis E, Dutra, Walderez O, Gascon, Joaquim,

Morillo, Carlos A, Oliveira-Filho, Jamary, et al. 2018. Chagas cardiomyopathy: an update of current clinical knowledge and management: a scientific statement from the American Heart Association. Circulation, 138(12), e169–e209.

Nunes, Maria do Carmo Pereira, Rocha, Manoel Ot´avio C, Ribeiro, Antˆonio Luiz P, Colosimo, Enrico A, Rezende, Renato A, Carmo, Guilherme Augusto A, & Barbosa, Marcia M. 2008. Right ventricular dysfunction is an independent predictor of survival in patients with dilated chronic Chagas’ cardiomyopathy. International journal of cardiology, 127(3), 372–379.

Paik, Myunghee Cho, & Tsai, Wei-Yann. 1997. On using the Cox proportional hazards model with missing covariates. Biometrika, 84(3), 579–593.

Qi, Lihong, Wang, Ying-Fang, & He, Yulei. 2010. A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Statistics in medicine, 29(25), 2592–2604.

Rathouz, Paul J. 2007. Identifiability assumptions for missing covariate data in failure time regression models. Biostatistics, 8(2), 345–356.

Robins, James M, Rotnitzky, Andrea, & Zhao, Lue Ping. 1994. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427), 846–866.

Rubin, Donald B. 1987. Multiple imputation for survey nonresponse.

Rubin, Donald B. 1996. Multiple imputation after 18+ years. Journal of the American statistical Association, 91(434), 473–489.

Seaman, Shaun R, & White, Ian R. 2013. Review of inverse probability weighting for dealing with missing data. Statistical methods in medical research, 22(3), 278–295.

Seaman, Shaun R, White, Ian R, Copas, Andrew J, & Li, Leah. 2012. Combining multiple imputation and inverse-probability weighting. Biometrics, 68(1), 129–137.

Steyerberg, Ewout W, et al. 2019. Clinical prediction models. Springer.

Tsiatis, Anastasios A. 1981. A large sample study of Cox’s regression model. The Annals of Statistics, 9(1), 93–108.

Van Buuren, Stef. 2018. Flexible imputation of missing data. CRC press.

White, Ian R., & Royston, Patrick. 2009. Imputing missing covariate values for the Cox model. Statistics in medicine, 28(15), 1982–1998.

White, Ian R, Royston, Patrick, & Wood, Angela M. 2011. Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine, 30(4), 377–399.

Yi, Yanyao, Ye, Ting, Yu, Menggang, & Shao, Jun. 2020. Cox regression with survival-time-dependent missing covariate values. Biometrics, 76(2), 460–471.




How to Cite

Padilha da Silva, J. L. (2023). A comparison of multiple imputation methods for the analysis of survival data with outcome related missing covariate values. Sigmae, 12(1), 76–89. Retrieved from http://publicacoes.unifal-mg.edu.br/revistas/index.php/sigmae/article/view/2014