Joint frailty models for recurrent and terminal events

Tutorials

In this post we’re going to take a look at joint frailty models, and how to fit them with our merlin command. Importantly, we’ll also discuss how to interpret the results.

Joint frailty models

An area of intense research in recent years is in the field of joint frailty models, which has become the commonly used name for a joint model for a recurrent event and a terminal event. We’re going to take a look at the most popular approach (Liu et al., 2004), and how to implement it in Stata.

In essence, we have a survival model for the recurrent event process, a survival model for the terminal event process, and we link them through a shared random effect. In other words, we have a random effect which accounts for the correlation between recurrent events, which is then included in the linear predictor for the terminal event model, with an accompanying coefficient to be estimated, which directly quantifies the strength of the association between the two processes. The nice property of this formulation, is that if there is no association, then they models reduce to their separate versions. Let’s formalise it.

We have for the recurrent event model, the hazard function for the ith patient and the jth event,

$$h_{ij}(t) = h_{0}(t) \exp (X_{1ij}\beta_{1} + b_{i})$$

where $h_{0}(t)$ is the baseline hazard rate, $X_{1ij}$ is a vector of baseline covariates with associated log hazard ratios, $\beta_{1}$, and finally a random intercept, $b_{i} \sim N(0, \sigma^{2})$. So far, this is a standard frailty survival model (we’re going to use frailty and random effect interchangeably in this post), where patients share the same unobserved effect $b_{i}$, which accounts for the correlation between events occurring in the same patient.

To bring in the terminal event process, we define the mortality rate for the ith patient,

$$\lambda_{i}(u) = \lambda_{0}(u) \exp (X_{2i}\beta_{2} + \alpha b_{i})$$

where $\lambda_{0}(u)$ is the baseline mortality rate, $X_{2i}$ is a vector of baseline covariates with associated log hazard ratios, $\beta_{2}$, and $\alpha$ directly quantifies the association between the recurrent and terminal event processes. To be explicit, $\exp (\alpha)$ represents the hazard ratio for a one unit increase in $b_{i}$ … which is not that simple. The important way to look at it is if $\alpha > 0$, then those with a higher frailty (i.e. higher underlying recurrent event rate) have an increased mortality rate. The other way around, if $\alpha < 0$ then those with a higher frailty have a reduced mortality rate.

Example

We illustrate these models using the readmission dataset that comes with the extensive frailtypack in R (Król et al., 2017), developed by Virginie Rondeau and her group. The dataset has information on re-hospitalisation times after surgery in patients diagnosed with colorectal cancer. Covariates of interest include gender, Dukes’ tumour stage and comorbidity Charlson index, but to keep things as simple as possible, we’re just going to include gender in our model below. Let’s load the dataset and generate a male binary indicator variables.

. use "https://www.mjcrowther.co.uk/data/jointfrailty_example",clear

. gen male = sex=="Male"

Let’s take a look at our dataset:

. list id time event stime death male if inlist(id,1,2)

     +------------------------------------------+
     | id   time   event   stime   death   male |
     |------------------------------------------|
  1. |  1     24       1       .       .      0 |
  2. |  1    433       1       .       .      0 |
  3. |  1    580       0    1037       0      0 |
  4. |  2    489       1       .       .      1 |
  5. |  2    693       0    1182       0      1 |
     +------------------------------------------+

The times of re-hospitalisation is stored in time, with corresponding event indicator event. This is in clock-reset formulation, i.e. each time a patient has a cancer recurrence the clock is reset to zero, so we will be fitting a semi-Markov model in this example. Our overall survival time is stored in stime, with corresponding event indicator stored in death. Both time of re-hospitalisation and time to death are recorded in days.

We can fit such a model with merlin, adjusting for male,

. merlin (time                             /// rehosp. times
>              male                        /// male
>              M1[id]@1                    /// random intercept
>              , family(weibull,           /// distribution
>                         failure(event))) ///
>        (stime                            /// survival time
>              male                        /// male
>              M1[id]                      /// random effect & association
>              , family(weibull,           /// distribution
>                         failure(death))) //

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -4330.2103  (not concave)
Iteration 1:   log likelihood = -4282.7654  
Iteration 2:   log likelihood = -4258.6868  
Iteration 3:   log likelihood = -4250.7419  
Iteration 4:   log likelihood = -4249.5375  
Iteration 5:   log likelihood = -4249.5313  
Iteration 6:   log likelihood = -4249.5306  
Iteration 7:   log likelihood = -4249.5306  

Mixed effects regression model                             Number of obs = 861
Log likelihood = -4249.5306
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
time:        |            
        male |   .3909186   .1615081     2.42   0.016     .0743686    .7074685
      M1[id] |          1          .        .       .            .           .
       _cons |  -5.139183    .232351   -22.12   0.000    -5.594583   -4.683784
  log(gamma) |  -.4162457   .0403026   -10.33   0.000    -.4952373   -.3372542
-------------+----------------------------------------------------------------
stime:       |            
        male |   .2266276     .27375     0.83   0.408    -.3099125    .7631677
      M1[id] |   1.430339   .2144692     6.67   0.000     1.009987    1.850691
       _cons |  -9.160412   .8457084   -10.83   0.000    -10.81797   -7.502854
  log(gamma) |   .0344113   .1007777     0.34   0.733    -.1631094    .2319321
-------------+----------------------------------------------------------------
id:          |            
      sd(M1) |   1.113295   .0950527                      .9417484     1.31609
------------------------------------------------------------------------------

The first submodel is our model for the time to rehospitalisation, where we’ve assumed a Weibull baseline, and a constant effect of male. The term M1[id] is the syntax required to specify a normally distributed random effect, with mean 0, in merlin. The name must begin with a capital letter, and we then recommend adding a number to make it unique (in case you want to add more). In square brackets we have to define the level at which the random effect applies – in our case we want to specify a random effect at the id level. Finally, the @1 notation constrains the coefficient on the random effect to be 1. If we didn’t specify this, it would by default have a coefficient which would be estimated. Keep reading.

The second submodel specification is our model for overall survival. We again adjust for male, but this time include the same random effect, M1, but without any constraint on its coefficient, and hence it will estimate one. This corresponds to $\alpha$ in the above formulation. Our syntax (based on Stata’s gsem command) provides a highly convenient way of linking random effects between outcome models. Let’s get to the results.

Our results show a highly positive estimate of association of 1.4130 (95% CI: 1.010, 1.851) showing substantial association between the two event processes, i.e. a higher rate of re-hospitalisation also indicates a higher the mortality rate. In our example of re-hospitalisation and mortality, such a direction of association is hardly surprising. As side note, we must remember to interpret our covariates effects conditional on the frailty.

Extensions

So what should we be thinking about next. Well, adjusting for more covariates, assessing proportional hazards, non-linear covariate effects, assessing the appropriateness of the Weibull baselines – all of these things can be incorporated very simply with merlin. Let’s fit Royston-Parmar models instead,

. merlin (time                             /// rehosp. times
>              male                        /// male
>              M1[id]@1                    /// random intercept
>              , family(rp, df(3)          /// distribution
>                         failure(event))) ///
>        (stime                            /// survival time
>              male                        /// male
>              M1[id]                      /// random effect & association
>              , family(rp, df(2)          /// distribution
>                         failure(death))) //
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_2

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -4302.7589  (not concave)
Iteration 1:   log likelihood =  -4288.631  (not concave)
Iteration 2:   log likelihood = -4256.0445  
Iteration 3:   log likelihood = -4235.0114  
Iteration 4:   log likelihood = -4233.6742  
Iteration 5:   log likelihood = -4233.6546  
Iteration 6:   log likelihood =  -4233.655  

Mixed effects regression model                             Number of obs = 861
Log likelihood = -4233.655
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
time:        |            
        male |   .3968829   .1588259     2.50   0.012       .08559    .7081759
      M1[id] |          1          .        .       .            .           .
       _cons |  -2.122345   .1509459   -14.06   0.000    -2.418193   -1.826496
-------------+----------------------------------------------------------------
stime:       |            
        male |   .2080517   .2573849     0.81   0.419    -.2964135    .7125168
      M1[id] |   1.296172   .1986155     6.53   0.000     .9068925    1.685451
       _cons |  -2.229829   .2496362    -8.93   0.000    -2.719107   -1.740552
-------------+----------------------------------------------------------------
id:          |            
      sd(M1) |   1.065379    .096208                      .8925597     1.27166
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

Note that each submodel can as different or similar as you like. Importantly, we know how how to formulate the crucial association structure in a joint frailty model, using a random intercept/frailty effect, so all of these extensions can be built on to our model.

In the above we are assuming a clock-reset timescale for the recurrent event process – clock-forward, i.e. allowing for delayed entry is not currently supported. We have more to come on joint frailty models, so do subscribe or follow us on Twitter.

Specialist subjects

Methods Development

Methods Development We provide expert guidance in finding the appropriate statistical approach to answer your question… and if there isn’t yet a method, well, we can develop one. While applying biostatistics to address your research question is essential, there may be times when existing methods fall short for your specific problem. In such cases, we’re […]

Specialist subjects

Applied Biostatistics

Applied Biostatistics Biostatistics plays a crucial role in advancing medical research. Whether it’s clinical trials, epidemiological studies, or pre-clinical research, biostatistics is essential for drawing meaningful, impactful conclusions from complex data. Our team consists of internationally recognized experts in applied biostatistics, with deep experience in a wide range of areas such as survival analysis, multi-state […]

Specialist subjects

Haematology

Haematological malignancies At Red Door Analytics, we have extensive experience in working with haematological malignancies, demonstrated through 18 publications in peer-reviewed journals. Our expertise spans epidemiological studies on prognosis and late effects, as well as randomised clinical trials. Based in Stockholm, we have unique experience in accessing and working with registry data from the Nordic […]

Videos

Introduction to Epidemiological Study Designs

This video offers a comprehensive introduction to epidemiological study designs, emphasising their classification, key definitions, strengths, limitations, and practical applications. We will dive into the most commonly used study designs, exploring their structure, purpose, and the contexts in which they are most effective. Throughout the video, real-world case studies of landmark research will be used […]

Specialist subjects

Real-World Evidence (RWE)

Real-World Evidence Real-world evidence (RWE) refers to data and information that, unlike data generated in clinical trials conducted in controlled environments, has been obtained from everyday clinical practice, patient registers, or other sources outside the clinical trial setting. RWE plays a crucial role in complementing traditional clinical trial data, providing insights into the safety, effectiveness, […]

Videos

State-of-the-art statistical models for modern HTA

At @RedDoorAnalytics, we develop methodology and software for efficient modelling of biomarkers, measured repeatedly over time, jointly with survival outcomes, which are being increasingly used in cancer settings. We have also developed methods and software for general non-Markov multi-state survival analysis, allowing for the development of more plausible natural history models, where patient history can […]

Videos

Multilevel (hierarchical) survival models: Estimation, prediction, interpretation

Hierarchical time-to-event data is common across various research domains. In the medical field, for instance, patients are often nested within hospitals and regions, while in education, students are nested within schools. In these settings, the outcome is typically measured at the individual level, with covariates recorded at any level of the hierarchy. This hierarchical structure […]

Statistical Primers

What are competing risks?

Competing risks In survival analysis, competing risks refer to the situation when an individual is at risk of experiencing an event that precludes the event under study to occur. Competing risks commonly occur in studies of cause-specific mortality, as all other causes of death than the one under study might happen before the individuals “have […]

Statistical Primers

What is immortal time bias?

Immortal time bias Immortal time bias is a type of bias that can occur in observational research when the study design allows for a period of time during which the outcome of interest cannot occur, often referred to as “immortal time”. Simply put, immortal time bias occurs when information from a future event is incorporated into the […]

Statistical Primers

What is the proportional hazards assumption?

Proportional hazards Proportional hazards in survival analysis means that the rate at which an event of interest occurs over time for two or more groups or individuals is proportional over time. Specifically, it assumes that the hazard ratio, which represents the relative rate of an event occurring between two groups or individuals, is constant over […]

Statistical Primers

What is censoring?

Censoring refers to a situation in survival analysis where the event of interest is not observed for some of the individuals under study. In this Statistical Primer, we’ll define three types of censoring often seen in survival analysis studies. Censoring occurs when the information on the survival time is incomplete or only partially observed. Censoring […]

Statistical Primers

What is the Cox model?

The Cox model The Cox model, also known as the proportional hazards model, is a popular statistical tool used to analyse survival data. It was developed by British statistician Sir David Cox, and published in 1972. It has gained popularity largely by avoiding making parametric assumptions about the shape of the baseline rate in a […]