Joint frailty models for recurrent and terminal events

In this post we’re going to take a look at joint frailty models, and how to fit them with our merlin command. Importantly, we’ll also discuss how to interpret the results.

Joint frailty models

An area of intense research in recent years is in the field of joint frailty models, which has become the commonly used name for a joint model for a recurrent event and a terminal event. We’re going to take a look at the most popular approach (Liu et al., 2004), and how to implement it in Stata.

In essence, we have a survival model for the recurrent event process, a survival model for the terminal event process, and we link them through a shared random effect. In other words, we have a random effect which accounts for the correlation between recurrent events, which is then included in the linear predictor for the terminal event model, with an accompanying coefficient to be estimated, which directly quantifies the strength of the association between the two processes. The nice property of this formulation, is that if there is no association, then they models reduce to their separate versions. Let’s formalise it.

We have for the recurrent event model, the hazard function for the ith patient and the jth event,

$$h_{ij}(t) = h_{0}(t) \exp (X_{1ij}\beta_{1} + b_{i})$$

where \(h_{0}(t)\) is the baseline hazard rate, \(X_{1ij}\) is a vector of baseline covariates with associated log hazard ratios, \(\beta_{1}\), and finally a random intercept, \(b_{i} \sim N(0, \sigma^{2})\). So far, this is a standard frailty survival model (we’re going to use frailty and random effect interchangeably in this post), where patients share the same unobserved effect \(b_{i}\), which accounts for the correlation between events occurring in the same patient.

To bring in the terminal event process, we define the mortality rate for the ith patient,

$$\lambda_{i}(u) = \lambda_{0}(u) \exp (X_{2i}\beta_{2} + \alpha b_{i})$$

where \(\lambda_{0}(u)\) is the baseline mortality rate, \(X_{2i}\) is a vector of baseline covariates with associated log hazard ratios, \(\beta_{2}\), and \(\alpha\) directly quantifies the association between the recurrent and terminal event processes. To be explicit, \(\exp (\alpha)\) represents the hazard ratio for a one unit increase in \(b_{i}\) … which is not that simple. The important way to look at it is if \(\alpha > 0\), then those with a higher frailty (i.e. higher underlying recurrent event rate) have an increased mortality rate. The other way around, if \(\alpha < 0\) then those with a higher frailty have a reduced mortality rate.

Example

We illustrate these models using the readmission dataset that comes with the extensive frailtypack in R (Król et al., 2017), developed by Virginie Rondeau and her group. The dataset has information on re-hospitalisation times after surgery in patients diagnosed with colorectal cancer. Covariates of interest include gender, Dukes’ tumour stage and comorbidity Charlson index, but to keep things as simple as possible, we’re just going to include gender in our model below. Let’s load the dataset and generate a male binary indicator variables.

. use "https://www.mjcrowther.co.uk/data/jointfrailty_example",clear

. gen male = sex=="Male"

Let’s take a look at our dataset:

. list id time event stime death male if inlist(id,1,2)

     +------------------------------------------+
     | id   time   event   stime   death   male |
     |------------------------------------------|
  1. |  1     24       1       .       .      0 |
  2. |  1    433       1       .       .      0 |
  3. |  1    580       0    1037       0      0 |
  4. |  2    489       1       .       .      1 |
  5. |  2    693       0    1182       0      1 |
     +------------------------------------------+

The times of re-hospitalisation is stored in time, with corresponding event indicator event. This is in clock-reset formulation, i.e. each time a patient has a cancer recurrence the clock is reset to zero, so we will be fitting a semi-Markov model in this example. Our overall survival time is stored in stime, with corresponding event indicator stored in death. Both time of re-hospitalisation and time to death are recorded in days.

We can fit such a model with merlin, adjusting for male,

. merlin (time                             /// rehosp. times
>              male                        /// male
>              M1[id]@1                    /// random intercept
>              , family(weibull,           /// distribution
>                         failure(event))) ///
>        (stime                            /// survival time
>              male                        /// male
>              M1[id]                      /// random effect & association
>              , family(weibull,           /// distribution
>                         failure(death))) //

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -4330.2103  (not concave)
Iteration 1:   log likelihood = -4282.7654  
Iteration 2:   log likelihood = -4258.6868  
Iteration 3:   log likelihood = -4250.7419  
Iteration 4:   log likelihood = -4249.5375  
Iteration 5:   log likelihood = -4249.5313  
Iteration 6:   log likelihood = -4249.5306  
Iteration 7:   log likelihood = -4249.5306  

Mixed effects regression model                             Number of obs = 861
Log likelihood = -4249.5306
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
time:        |            
        male |   .3909186   .1615081     2.42   0.016     .0743686    .7074685
      M1[id] |          1          .        .       .            .           .
       _cons |  -5.139183    .232351   -22.12   0.000    -5.594583   -4.683784
  log(gamma) |  -.4162457   .0403026   -10.33   0.000    -.4952373   -.3372542
-------------+----------------------------------------------------------------
stime:       |            
        male |   .2266276     .27375     0.83   0.408    -.3099125    .7631677
      M1[id] |   1.430339   .2144692     6.67   0.000     1.009987    1.850691
       _cons |  -9.160412   .8457084   -10.83   0.000    -10.81797   -7.502854
  log(gamma) |   .0344113   .1007777     0.34   0.733    -.1631094    .2319321
-------------+----------------------------------------------------------------
id:          |            
      sd(M1) |   1.113295   .0950527                      .9417484     1.31609
------------------------------------------------------------------------------

The first submodel is our model for the time to rehospitalisation, where we’ve assumed a Weibull baseline, and a constant effect of male. The term M1[id] is the syntax required to specify a normally distributed random effect, with mean 0, in merlin. The name must begin with a capital letter, and we then recommend adding a number to make it unique (in case you want to add more). In square brackets we have to define the level at which the random effect applies – in our case we want to specify a random effect at the id level. Finally, the @1 notation constrains the coefficient on the random effect to be 1. If we didn’t specify this, it would by default have a coefficient which would be estimated. Keep reading.

The second submodel specification is our model for overall survival. We again adjust for male, but this time include the same random effect, M1, but without any constraint on its coefficient, and hence it will estimate one. This corresponds to \(\alpha\) in the above formulation. Our syntax (based on Stata’s gsem command) provides a highly convenient way of linking random effects between outcome models. Let’s get to the results.

Our results show a highly positive estimate of association of 1.4130 (95% CI: 1.010, 1.851) showing substantial association between the two event processes, i.e. a higher rate of re-hospitalisation also indicates a higher the mortality rate. In our example of re-hospitalisation and mortality, such a direction of association is hardly surprising. As side note, we must remember to interpret our covariates effects conditional on the frailty.

Extensions

So what should we be thinking about next. Well, adjusting for more covariates, assessing proportional hazards, non-linear covariate effects, assessing the appropriateness of the Weibull baselines – all of these things can be incorporated very simply with merlin. Let’s fit Royston-Parmar models instead,

. merlin (time                             /// rehosp. times
>              male                        /// male
>              M1[id]@1                    /// random intercept
>              , family(rp, df(3)          /// distribution
>                         failure(event))) ///
>        (stime                            /// survival time
>              male                        /// male
>              M1[id]                      /// random effect & association
>              , family(rp, df(2)          /// distribution
>                         failure(death))) //
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_2

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -4302.7589  (not concave)
Iteration 1:   log likelihood =  -4288.631  (not concave)
Iteration 2:   log likelihood = -4256.0445  
Iteration 3:   log likelihood = -4235.0114  
Iteration 4:   log likelihood = -4233.6742  
Iteration 5:   log likelihood = -4233.6546  
Iteration 6:   log likelihood =  -4233.655  

Mixed effects regression model                             Number of obs = 861
Log likelihood = -4233.655
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
time:        |            
        male |   .3968829   .1588259     2.50   0.012       .08559    .7081759
      M1[id] |          1          .        .       .            .           .
       _cons |  -2.122345   .1509459   -14.06   0.000    -2.418193   -1.826496
-------------+----------------------------------------------------------------
stime:       |            
        male |   .2080517   .2573849     0.81   0.419    -.2964135    .7125168
      M1[id] |   1.296172   .1986155     6.53   0.000     .9068925    1.685451
       _cons |  -2.229829   .2496362    -8.93   0.000    -2.719107   -1.740552
-------------+----------------------------------------------------------------
id:          |            
      sd(M1) |   1.065379    .096208                      .8925597     1.27166
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

Note that each submodel can as different or similar as you like. Importantly, we know how how to formulate the crucial association structure in a joint frailty model, using a random intercept/frailty effect, so all of these extensions can be built on to our model.

In the above we are assuming a clock-reset timescale for the recurrent event process – clock-forward, i.e. allowing for delayed entry is not currently supported. We have more to come on joint frailty models, so do subscribe or follow us on Twitter.

Videos

State-of-the-art statistical models for modern HTA

At @RedDoorAnalytics, we develop methodology and software for efficient modelling of biomarkers, measured repeatedly over time, jointly with survival outcomes, which are being increasingly used in cancer settings. We have also developed methods and software for general non-Markov multi-state survival analysis, allowing for the development of more plausible natural history models, where patient history can […]
Learn more

Videos

Multilevel (hierarchical) survival models: Estimation, prediction, interpretation

Hierarchical time-to-event data is common across various research domains. In the medical field, for instance, patients are often nested within hospitals and regions, while in education, students are nested within schools. In these settings, the outcome is typically measured at the individual level, with covariates recorded at any level of the hierarchy. This hierarchical structure […]
Learn more

Statistical Primers

What are competing risks?

Competing risks In survival analysis, competing risks refer to the situation when an individual is at risk of experiencing an event that precludes the event under study to occur. Competing risks commonly occur in studies of cause-specific mortality, as all other causes of death than the one under study might happen before the individuals “have […]
Learn more