Patrick Royston
Abstract. The Cox proportional hazards model has been used extensively in medicine over the last 40 years. A popular application is to develop a multivariable prediction model, often a prognostic model to predict the clinical outcome of patients with a particular disorder from “baseline” factors measured at some initial time point. For such a model to be useful in practice, it must be “validated”; that is, it must perform satisfactorily in an external sample of patients independent of the sample on which the model was originally developed. One key aspect of performance is calibration, which is the accuracy of prediction, particularly of survival (or equivalently, failure or event) probabilities at any time after the time origin. We believe systematic evaluation of the calibration of a Cox model has been largely ignored in the literature. In this article, we suggest an approach to assessing calibration using individual event probabilities estimated at different time points. We exemplify the method by detailed analysis of two datasets in the disease primary biliary cirrhosis; the datasets comprise a derivation and a validation dataset. We describe a new command, stcoxcal, that performs the necessary calculations. Results for stcoxcal can be displayed graphically, which makes it easier for users to picture calibration (or lack thereof) according to follow-up time.