Joint frailty models for recurrent and terminal events
In this post we’re going to take a look at joint frailty models, and how to fit them with our merlin
command. Importantly, we’ll also discuss how to interpret the results.
Joint frailty models
An area of intense research in recent years is in the field of joint frailty models, which has become the commonly used name for a joint model for a recurrent event and a terminal event. We’re going to take a look at the most popular approach (Liu et al., 2004), and how to implement it in Stata.
In essence, we have a survival model for the recurrent event process, a survival model for the terminal event process, and we link them through a shared random effect. In other words, we have a random effect which accounts for the correlation between recurrent events, which is then included in the linear predictor for the terminal event model, with an accompanying coefficient to be estimated, which directly quantifies the strength of the association between the two processes. The nice property of this formulation, is that if there is no association, then they models reduce to their separate versions. Let’s formalise it.
We have for the recurrent event model, the hazard function for the ith patient and the jth event,
$$h_{ij}(t) = h_{0}(t) \exp (X_{1ij}\beta_{1} + b_{i})$$
where \(h_{0}(t)\) is the baseline hazard rate, \(X_{1ij}\) is a vector of baseline covariates with associated log hazard ratios, \(\beta_{1}\), and finally a random intercept, \(b_{i} \sim N(0, \sigma^{2})\). So far, this is a standard frailty survival model (we’re going to use frailty and random effect interchangeably in this post), where patients share the same unobserved effect \(b_{i}\), which accounts for the correlation between events occurring in the same patient.
To bring in the terminal event process, we define the mortality rate for the ith patient,
$$\lambda_{i}(u) = \lambda_{0}(u) \exp (X_{2i}\beta_{2} + \alpha b_{i})$$
where \(\lambda_{0}(u)\) is the baseline mortality rate, \(X_{2i}\) is a vector of baseline covariates with associated log hazard ratios, \(\beta_{2}\), and \(\alpha\) directly quantifies the association between the recurrent and terminal event processes. To be explicit, \(\exp (\alpha)\) represents the hazard ratio for a one unit increase in \(b_{i}\) … which is not that simple. The important way to look at it is if \(\alpha > 0\), then those with a higher frailty (i.e. higher underlying recurrent event rate) have an increased mortality rate. The other way around, if \(\alpha < 0\) then those with a higher frailty have a reduced mortality rate.
Example
We illustrate these models using the readmission
dataset that comes with the extensive frailtypack
in R
(Król et al., 2017), developed by Virginie Rondeau and her group. The dataset has information on re-hospitalisation times after surgery in patients diagnosed with colorectal cancer. Covariates of interest include gender, Dukes’ tumour stage and comorbidity Charlson index, but to keep things as simple as possible, we’re just going to include gender in our model below. Let’s load the dataset and generate a male
binary indicator variables.
. use "https://www.mjcrowther.co.uk/data/jointfrailty_example",clear
. gen male = sex=="Male"
Let’s take a look at our dataset:
. list id time event stime death male if inlist(id,1,2)
+------------------------------------------+
| id time event stime death male |
|------------------------------------------|
1. | 1 24 1 . . 0 |
2. | 1 433 1 . . 0 |
3. | 1 580 0 1037 0 0 |
4. | 2 489 1 . . 1 |
5. | 2 693 0 1182 0 1 |
+------------------------------------------+
The times of re-hospitalisation is stored in time
, with corresponding event indicator event
. This is in clock-reset formulation, i.e. each time a patient has a cancer recurrence the clock is reset to zero, so we will be fitting a semi-Markov model in this example. Our overall survival time is stored in stime
, with corresponding event indicator stored in death
. Both time of re-hospitalisation and time to death are recorded in days.
We can fit such a model with merlin
, adjusting for male
,
. merlin (time /// rehosp. times
> male /// male
> M1[id]@1 /// random intercept
> , family(weibull, /// distribution
> failure(event))) ///
> (stime /// survival time
> male /// male
> M1[id] /// random effect & association
> , family(weibull, /// distribution
> failure(death))) //
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -4330.2103 (not concave)
Iteration 1: log likelihood = -4282.7654
Iteration 2: log likelihood = -4258.6868
Iteration 3: log likelihood = -4250.7419
Iteration 4: log likelihood = -4249.5375
Iteration 5: log likelihood = -4249.5313
Iteration 6: log likelihood = -4249.5306
Iteration 7: log likelihood = -4249.5306
Mixed effects regression model Number of obs = 861
Log likelihood = -4249.5306
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
time: |
male | .3909186 .1615081 2.42 0.016 .0743686 .7074685
M1[id] | 1 . . . . .
_cons | -5.139183 .232351 -22.12 0.000 -5.594583 -4.683784
log(gamma) | -.4162457 .0403026 -10.33 0.000 -.4952373 -.3372542
-------------+----------------------------------------------------------------
stime: |
male | .2266276 .27375 0.83 0.408 -.3099125 .7631677
M1[id] | 1.430339 .2144692 6.67 0.000 1.009987 1.850691
_cons | -9.160412 .8457084 -10.83 0.000 -10.81797 -7.502854
log(gamma) | .0344113 .1007777 0.34 0.733 -.1631094 .2319321
-------------+----------------------------------------------------------------
id: |
sd(M1) | 1.113295 .0950527 .9417484 1.31609
------------------------------------------------------------------------------
The first submodel is our model for the time to rehospitalisation, where we’ve assumed a Weibull baseline, and a constant effect of male
. The term M1[id]
is the syntax required to specify a normally distributed random effect, with mean 0, in merlin
. The name must begin with a capital letter, and we then recommend adding a number to make it unique (in case you want to add more). In square brackets we have to define the level at which the random effect applies – in our case we want to specify a random effect at the id
level. Finally, the @1
notation constrains the coefficient on the random effect to be 1. If we didn’t specify this, it would by default have a coefficient which would be estimated. Keep reading.
The second submodel specification is our model for overall survival. We again adjust for male
, but this time include the same random effect, M1
, but without any constraint on its coefficient, and hence it will estimate one. This corresponds to \(\alpha\) in the above formulation. Our syntax (based on Stata’s gsem
command) provides a highly convenient way of linking random effects between outcome models. Let’s get to the results.
Our results show a highly positive estimate of association of 1.4130 (95% CI: 1.010, 1.851) showing substantial association between the two event processes, i.e. a higher rate of re-hospitalisation also indicates a higher the mortality rate. In our example of re-hospitalisation and mortality, such a direction of association is hardly surprising. As side note, we must remember to interpret our covariates effects conditional on the frailty.
Extensions
So what should we be thinking about next. Well, adjusting for more covariates, assessing proportional hazards, non-linear covariate effects, assessing the appropriateness of the Weibull baselines – all of these things can be incorporated very simply with merlin
. Let’s fit Royston-Parmar models instead,
. merlin (time /// rehosp. times
> male /// male
> M1[id]@1 /// random intercept
> , family(rp, df(3) /// distribution
> failure(event))) ///
> (stime /// survival time
> male /// male
> M1[id] /// random effect & association
> , family(rp, df(2) /// distribution
> failure(death))) //
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_2
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -4302.7589 (not concave)
Iteration 1: log likelihood = -4288.631 (not concave)
Iteration 2: log likelihood = -4256.0445
Iteration 3: log likelihood = -4235.0114
Iteration 4: log likelihood = -4233.6742
Iteration 5: log likelihood = -4233.6546
Iteration 6: log likelihood = -4233.655
Mixed effects regression model Number of obs = 861
Log likelihood = -4233.655
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
time: |
male | .3968829 .1588259 2.50 0.012 .08559 .7081759
M1[id] | 1 . . . . .
_cons | -2.122345 .1509459 -14.06 0.000 -2.418193 -1.826496
-------------+----------------------------------------------------------------
stime: |
male | .2080517 .2573849 0.81 0.419 -.2964135 .7125168
M1[id] | 1.296172 .1986155 6.53 0.000 .9068925 1.685451
_cons | -2.229829 .2496362 -8.93 0.000 -2.719107 -1.740552
-------------+----------------------------------------------------------------
id: |
sd(M1) | 1.065379 .096208 .8925597 1.27166
------------------------------------------------------------------------------
Warning: Baseline spline coefficients not shown - use ml display
Note that each submodel can as different or similar as you like. Importantly, we know how how to formulate the crucial association structure in a joint frailty model, using a random intercept/frailty effect, so all of these extensions can be built on to our model.
In the above we are assuming a clock-reset timescale for the recurrent event process – clock-forward, i.e. allowing for delayed entry is not currently supported. We have more to come on joint frailty models, so do subscribe or follow us on Twitter.
Videos
State-of-the-art statistical models for modern HTA
Videos
Multilevel (hierarchical) survival models: Estimation, prediction, interpretation
Statistical Primers