In our latest publication, we developed a new methodological approach to obtain fair comparisons of survival probabilities (and differences thereof) across study clusters such as hospitals, regions or other hierarchical units. The proposed approach combines posterior prediction of the random effects with regression standardisation to account for differences in the case-mix distribution between the clusters. We also developed an accompanying Stata command, available on our company’s GitHub page.
Three Questions to Alessandro
What was the most interesting part of this project for you?
The most interesting part of this project, for me, was to start with a substantive application and develop the methodology from the ground up to answer that question. In fact, we are currently working on this application – so stay tuned for that!
We were inspired by similar work in the settings of standard linear mixed effects models (i.e., for continuous outcomes), and further developed the methodology to accommodate survival outcomes. Finally, the software development part was very rewarding, as we developed a user-friendly Stata command that other researchers could use to apply the methodology in practice.
Can you give us a short summary of the paper, including the main takeaways?
The paper combines regression standardisation with posterior predictions of the random effects in multilevel (hierarchical) survival models to produce standardised survival probabilities that allow for fair and interpretable comparisons between hierarchical units (e.g., different surgeons or hospitals). These standardised predictions quantify how the entire study population would have fared under the performance of a specific cluster, e.g., “what if the entire study population was exposed to the level of care of a certain hospital”. By adjusting for and standardising over a common case mix (i.e., the characteristics of patients being treated), these differences are accounted for and predictions for different clusters can be compared fairly.
The method is demonstrated using a three‑level dataset (with patients nested within surgeons nested within centres), and we show how the methodology could be used to benchmark best/worst/average providers, compare surgeons within a centre, compare centres directly, and compute contrasts (which can be interpreted as risk differences) between units, all from a single, unified model.
Can you share any thoughts about other potential applications in which this approach might be especially useful?
This approach could be valuable anywhere you need fair, risk‑based comparisons of higher‑level units while adjusting for individual case‑mix and censoring: examples include benchmarking hospitals or surgeons on time‑to‑readmission or survival after surgery, comparing schools or teachers on time‑to‑dropout; evaluating regional programs on time‑to‑employment; assessing manufacturing plants or machines on time‑to‑failure; and analysing multi‑centre studies or registries where centres and/or clinicians vary in performance. It also fits comparative‑effectiveness and quality‑improvement work (e.g., transplant centres), policy evaluations with time‑to‑event outcomes, and studies of neighbourhood or institutional effects on survival outcomes; these settings benefit from the method’s ability to fix cluster effects and standardise over the observed case‑mix.
Multilevel survival models are already used in many fields (such as medicine, public health, and education), so extending them with standardised survival probabilities at the cluster level naturally broadens their practical impact.
Publication details
Gasparini A, Crowther MJ, Schaffer JM. Standardized survival probabilities and contrasts between hierarchical units in multilevel survival models. BMC Medical Research Methodology 2026.