"Electronic health records (EHR) often contain varying levels of missing data. This study aimed to compare different imputation strategies to identify the most appropriate missing data handling approach for predicting central line-associated bloodstream infection (CLABSI) in the presence of competing risks using EHR data" Gao et al (2026).
Dynamic risk prediction of central-line associated bloodstream infection

Abstract:

Background: Electronic health records (EHR) often contain varying levels of missing data. This study aimed to compare different imputation strategies to identify the most appropriate missing data handling approach for predicting central line-associated bloodstream infection (CLABSI) in the presence of competing risks using EHR data.

Methods: We analyzed data from 30,862 catheter episodes at University Hospitals Leuven from 2012 to 2013 to predict 7-day risk of CLABSI using the landmark cause-specific supermodel, accounting for competing risks of hospital discharge and death. Imputation strategies varied from simple methods (median/mode and last observation carried forward) to advanced techniques such as multiple imputation and mixed-effects models that leveraged the longitudinal nature of the data. Random forest imputation which preserves interactions and non-linear relationships was also assessed. We also considered the use of missing indicators combined with all other imputation approaches. Model performance was evaluated dynamically at daily landmarks up to 14 days after catheter placement.

Results: All approaches achieved similar discrimination, calibration and overall performance, with the missing indicator approach yielding the highest mean area under the receiver operating characteristic curve (AUROC 0.782). Combining missing indicators with other methods marginally improved performance over standalone approaches, with the mixed model approach combined with missing indicators achieving the highest AUROC (0.783) at landmark day 4, and the missForestPredict approach combined with missing indicators yielding the best scaled Brier scores at earlier landmarks.

Conclusions: The imputation method had minimal impact on predictive performance. Missing indicator methods ranked first among the methods, suggesting that in EHR data, the presence or absence of information may hold valuable insights for patient risk prediction. On the other hand, the use of missing indicators requires caution, as shifts in EHR data over time can alter missing data patterns, potentially impacting model transportability.

Reference:

Gao S, Albu E, Stijnen P, Rademakers F, Cossey V, Debaveye Y, Janssens C, Van Calster B, Wynants L. Comparing methods for handling missing data in electronic health records for dynamic risk prediction of central-line associated bloodstream infection. BMC Med Res Methodol. 2026 Apr 21. doi: 10.1186/s12874-026-02819-y. Epub ahead of print. PMID: 42015033.