# Flexible parametric survival analysis with frailty

This example takes a look at incorporating a frailty, or random intercept, into a flexible parametric survival model, and how to fit them in Stata. First we’ll use `merlin`

to estimate our model, and then the more user-friendly wrapper function `stmixed`

. More details on these models can be found in the following papers:

- Crowther MJ, Look MP, Riley RD. Multilevel mixed effects parametric survival models using adaptive Gauss-Hermite quadrature with application to recurrent events and individual participant data meta-analysis.
*Statistics in Medicine*2014;33(22):3844-3858. - Crowther MJ. Multilevel mixed-effects parametric survival analysis: Estimation, simulation, and application.
*The Stata Journal*2019;19(4):931-949.

We can define a multilevel proportional hazards survival model as follows,

$$h_{ij}(t) = h_{0}(t) \exp (X_{ij}\beta + b_{i})$$

where

$$b_{i} \sim N(0,\sigma^{2})$$

Now the flexible parametric model specifies the linear predictor on the log cumulative hazard scale, so in actual fact we have,

$$H_{ij}(t) = H_{0}(t)\exp (X_{ij}\beta + b_{i})$$

but with no time-dependent effects, log cumulative hazard ratios can be interpreted as log hazard ratios. So, we have our vector of baseline covariates \(X_{ij}\) and associated conditional log hazard ratios \(\beta\) (conditional on the frailty, \(b_{i}\)). We can load an example simulated dataset from the `stmixed`

package

`. use http://fmwww.bc.edu/repec/bocode/s/stmixed_example1, clear`

which represents a multi-centre trial scenario, with 100 centres and each centre recuiting 60 patients, resulting in 6000 observations. Two covariates were collected, a binary covariate \(X_{1}\) (coded 0/1), and a continuous covariate, \(X_{2}\), within the range [0,1].

Let’s inspect the first few rows of our dataset,

```
. list centre stime event x1 x2 in 1/5, noobs
+-------------------------------------------+
| centre stime event x1 x2 |
|-------------------------------------------|
| 1 .2714585 1 1 .4306063 |
| 1 .7353575 1 0 .5285968 |
| 1 .8213144 1 0 .1442798 |
| 1 .758931 1 1 .8612115 |
| 1 .9547321 0 0 .501694 |
+-------------------------------------------+
```

where `centre`

denotes the centre number for each patient, `stime`

contains our survival time, `event`

our indicator variable for those that died (1) or were censored (0), `x1`

our binary covariate, and `x2`

our continuous covariate. We can fit our model as follows

```
. merlin (stime /// survival time
> x1 x2 /// fixed covariates
> M1[centre]@1 /// random intercept
> , ///
> family(rp, df(3) failure(event))) // baseline distribution
variables created: _rcs1_1 to _rcs1_3
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -2195.8633
Iteration 1: log likelihood = -1964.8829
Iteration 2: log likelihood = -1962.755
Iteration 3: log likelihood = -1962.7449
Iteration 4: log likelihood = -1962.7449
Mixed effects regression model Number of obs = 6,000
Log likelihood = -1962.7449
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
stime: |
x1 | 1.008152 .0367602 27.43 0.000 .936103 1.0802
x2 | -.96152 .0605554 -15.88 0.000 -1.080206 -.8428335
M1[centre] | 1 . . . . .
_cons | -1.499116 .0994654 -15.07 0.000 -1.694065 -1.304167
-------------+----------------------------------------------------------------
centre: |
sd(M1) | .8965598 .0673174 .7738693 1.038702
------------------------------------------------------------------------------
Warning: Baseline spline coefficients not shown - use ml display
```

The key part of the syntax is specifying a normally distributed random effect, which we’ve called `M1`

(it must start with a capital letter, then a number), with the variable defining the cluster in square brackets `[centre]`

, and finally, we tell `merlin`

not to estimate a coefficient, but rather constrain it to be 1. Our model converges nicely, and we observe an estimate of \(\sigma =\) 0.897 (95% CI: 0.774, 1.039), showing substantial heterogeneity between centres.

`merlin`

has a challenging syntax because it emcompasses a lot of different modelling frameworks. To make it more user-friendly, we’ve developed a few wrapper functions to make our lives easier. One of these is `stmixed`

which is specifically designed to have more consistent syntax for specifying a random effects model in Stata. It essentially follows the same design as `mestreg`

. It also has the added benefit of requiring the data to be `stset`

before use. So we can fit the same model with,

```
. stset stime, failure(event)
Survival-time data settings
Failure event: event!=0 & event<.
Observed time interval: (0, stime]
Exit on or before: failure
--------------------------------------------------------------------------
6,000 total observations
0 exclusions
--------------------------------------------------------------------------
6,000 observations remaining, representing
3,427 failures in single-record/single-failure data
3,550.738 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 1.985816
. stmixed x1 x2 || centre:, distribution(rp) df(3)
Random effect M1: Intercept at level centre
variables created: _rcs1_1 to _rcs1_3
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -2195.8633
Iteration 1: log likelihood = -1964.8829
Iteration 2: log likelihood = -1962.755
Iteration 3: log likelihood = -1962.7449
Iteration 4: log likelihood = -1962.7449
Mixed effects survival model Number of obs = 6,000
Log likelihood = -1962.7449
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_t: |
x1 | 1.008152 .0367602 27.43 0.000 .936103 1.0802
x2 | -.96152 .0605554 -15.88 0.000 -1.080206 -.8428335
M1[centre] | 1 . . . . .
_cons | -1.499116 .0994654 -15.07 0.000 -1.694065 -1.304167
-------------+----------------------------------------------------------------
centre: |
sd(M1) | .8965598 .0673174 .7738693 1.038702
------------------------------------------------------------------------------
Warning: Baseline spline coefficients not shown - use ml display
```

obtaining identical estimates, unsurprisingly. `stmixed`

gives us some helpful output telling us what it’s named the random effect, and what level it has been specified.

There’s lots of predictions available post-estimation, just take a look at `help stmixed postestimation`

for more details.

Videos

### State-of-the-art statistical models for modern HTA

Videos

### Multilevel (hierarchical) survival models: Estimation, prediction, interpretation

Statistical Primers