Broadstreet presents a new method for reconstructing individual patient data from Kaplan-Meier curves


A Kaplan-Meier curve with censoring times marked
A Kaplan-Meier curve with censoring times marked

Broadstreet team member, Basia Rogula (Senior Statistician), was at the 40th Annual North American meeting of the Society for Medical Decision Making in Montreal this week to present a novel method we have developed for reconstructing individual patient data (IPD) from Kaplan-Meier (KM) survival curves. The development of this technique, innovative because it incorporates exact censoring times, came about because we frequently use patient-level survival data in economic models, but often only aggregate data is available in the form of KM curves. When this is the case, it becomes necessary to reconstruct the IPD from the image.

Censoring, in relation to KM curves, occurs when information about a patient is no longer available during the period of interest or at the end of the period of interest (for those in whom the event of interest did not occur). So, for example, in a study estimating the occurrence of heart attacks in a cohort of people over a one-year period, if a patient is lost to follow-up during that period, they are censored at the time they are last observed. Patients for whom a heart attack does not occur during the study period are censored at the end of the study. The use of censoring information when reconstructing IPD is important because its incorporation maximizes the accuracy of the resulting data and avoids bias. Though timing of censoring is frequently marked on KM curves, these data tend to be ignored by existing algorithms which instead use numbers at risk or event counts to account for censoring. The problem with the existing method is that event counts and/or numbers at risk are not always available and, when they are available, are typically presented with less precision than individual censor-markers. As a result, it has not always been possible to accurately account for censoring times, despite the data being available on the KM curve.

Our newly-proposed method extracts the censoring time from the curve and the algorithm incorporates these extracted times to produce accurate pseudo IPD that, when used to reconstruct the original KM curve, produces a highly analogous version. The new method was validated by simulating two survival curves from three survival distributions (loglogistic, lognormal, and Weibull) and reconstructing the IPD. After estimating and comparing the hazard ratios for the original and reconstructed IPD and visually inspecting the reconstructed curves, we were satisfied that there were no systematic differences in hazard ratios and that the reconstructed IPD provided a close fit to the original curve.

Rather than just being an alternative to existing methods when the number at risk or total censoring events counts are not available, this algorithm represents a step forward in the reconstruction of IPD. When exact censoring times are marked on the Kaplan-Meier curve, it makes sense to incorporate this information, as doing this should result in more exact IPD, reduce bias in the estimates, and thereby enhance the validity and robustness of the end analyses into which these data are incorporated.