Flexible parametric survival analysis with frailty, take two

Hi everyone,

Today we are going to continue learning about flexible parametric survival models with random effects, as a follow-up to a previous post on our blog. Specifically, we are going to replicate the analysis from that post, but with a twist: today, we are going to use the {merlin} package in R.

{merlin} in R is the little brother to its Stata counterpart. Analogously, it can be used to fit a variety of fixed- and mixed-effects models, with a flexible syntax and support for several distributions. It is currently available from the Comprehensive R Archive Network (CRAN), and if you want to check it out, it can easily be installed with the usual install.packages("merlin") command. A pre-print is also available on arXiv, with several examples of {merlin} in action.

Let’s start by loading the same simulated dataset. For this, we use the {haven} package:

library(haven)

stmixed_example <- read_dta(file = "http://fmwww.bc.edu/repec/bocode/s/stmixed_example1.dta")

head(stmixed_example)
#> # A tibble: 6 × 9
#>   centre    x1    x2 stime event `_st`  `_d`  `_t` `_t0`
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1      1     1 0.431 0.271     1     1     1 0.271     0
#> 2      1     0 0.529 0.735     1     1     1 0.735     0
#> 3      1     0 0.144 0.821     1     1     1 0.821     0
#> 4      1     1 0.861 0.759     1     1     1 0.759     0
#> 5      1     0 0.502 0.955     0     1     0 0.955     0
#> 6      1     1 0.692 0.232     0     1     0 0.232     0

Note that we need to tidy up the dataset a little bit before we can use it with {merlin}. The first issue is that read_dta() reads data into a tibble by default, which can cause odd issues sometimes:

class(stmixed_example)
#> [1] "tbl_df"     "tbl"        "data.frame"

Some column names are also not syntactically valid in R (while being completely okay in Stata, which accepts column names starting with underscores):

names(stmixed_example)
#> [1] "centre" "x1"     "x2"     "stime"  "event"  "_st"    "_d"     "_t"    
#> [9] "_t0"

We can fix both of these issues very easily, by turning the data into a standard data.frame and by using the make.names() function in base R to create syntactically valid names:

stmixed_example <- as.data.frame(stmixed_example)
names(stmixed_example) <- make.names(names = names(stmixed_example))

The new dataset now looks like this:

head(stmixed_example)
#>   centre x1        x2     stime event X_st X_d       X_t X_t0
#> 1      1  1 0.4306063 0.2714585     1    1   1 0.2714585    0
#> 2      1  0 0.5285968 0.7353575     1    1   1 0.7353575    0
#> 3      1  0 0.1442798 0.8213144     1    1   1 0.8213144    0
#> 4      1  1 0.8612115 0.7589310     1    1   1 0.7589310    0
#> 5      1  0 0.5016940 0.9547321     0    1   0 0.9547321    0
#> 6      1  1 0.6924560 0.2323809     0    1   0 0.2323809    0

Great, we are now ready to continue.

Remember the columns of interest:

head(stmixed_example[, c("centre", "stime", "event", "x1", "x2")])
#>   centre     stime event x1        x2
#> 1      1 0.2714585     1  1 0.4306063
#> 2      1 0.7353575     1  0 0.5285968
#> 3      1 0.8213144     1  0 0.1442798
#> 4      1 0.7589310     1  1 0.8612115
#> 5      1 0.9547321     0  0 0.5016940
#> 6      1 0.2323809     0  1 0.6924560

where centre denotes the clustering variable (for each patient), stime denotes the survival time, event the event indicator variable (1 for those that died, 0 otherwise), and two covariates x1(binary) and x2 (continuous).

Remember also the model that we are trying to fit: 

H(tij) = H0(tij) exp⁡(Xijβ+ui)

This is a model on the log-cumulative hazard scale, where we model the baseline cumulative hazard H0 using restricted cubic splines. Model coefficients (β) can be interpreted as log hazard ratios, as usual, but conditionally on the frailty term ui.

Disclaimer: throughout this post, I will mostly focus on the translation of the example to R; please refer to the previous post for more details on interpretation, model specification, and so on.

We can fit this model using {merlin} in R:

library(merlin)

fit <- merlin(
  model = list(
    Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE) + M1[centre] * 1
  ),
  levels = "centre",
  family = "rp",
  timevar = "stime",
  data = stmixed_example
)
#> Error in stats::optim(par = b, fn = merlin_gf, gml = gml, hessian = TRUE, : function cannot be evaluated at initial parameters

…oh no, an error! It looks like the optimisation routine didn’t like the default starting values. Luckily we can improve the starting values to help the optimiser, as we will see later on, but first, let’s describe each argument of the call to merlin():

  • model defines a list of (possibly more than one) models, in our case just one. Given that we are fitting a flexible parametric model, we need to remember to include the rcs() term in the linear predictor, denoting the spline for the baseline cumulative hazard (with 3 degrees of freedom). We also need to include M1[centre], denoting the frailty at the centre level, and *1 to constrain that parameter to the value of one.
  • levels is used to define which variables identify clusters in our data.
  • family is used to define the model family, in this case, "rp" for Royston-Parmar (flexible parametric models on the log-cumulative hazard scale).
  • timevar is used to define which variable identifies time.
  • Finally, data is used to pass a dataset to merlin(). All variables defined above must be present in data for merlin() to be able to proceed.

Okay, let’s go back to the issue of bad starting values. What we could do here is to fit a flexible parametric (without frailty) to get better starting values for the regression coefficients, and a simpler model (e.g., assuming an exponential baseline hazard) with a frailty term to get a better starting value for the frailty variance.

The flexible parametric model without frailty can be fit with:

fit_rp <- merlin(
  model = list(
    Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE)
  ),
  family = "rp",
  timevar = "stime",
  data = stmixed_example
)
fit_rp
#> Merlin: mixed-effects model
#> Data: stmixed_example 
#> 
#> Coefficients:
#>       x1        x2   rcs():1   rcs():2   rcs():3     _cons  
#>  0.72945  -0.71046   1.67913   0.11093   0.04262  -1.32214

and the exponential model with frailty can be fit with:

fit_expf <- merlin(
  model = list(
    Surv(stime, event) ~ x1 + x2 + M1[centre] * 1
  ),
  levels = "centre",
  family = "exponential",
  timevar = "stime",
  data = stmixed_example
)
fit_expf
#> Merlin: mixed-effects model
#> Data: stmixed_example 
#> 
#> Coefficients:
#>         x1          x2       _cons  log_sd(M1)  
#>    0.64988    -0.61154    -0.08712    -0.69603

Now that we have all the ingredients, we can try again to fit the initial model while passing improved starting values via the from argument:

fit <- merlin(
  model = list(
    Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE) + M1[centre] * 1
  ),
  levels = "centre",
  family = "rp",
  timevar = "stime",
  data = stmixed_example,
  from = c(coef(fit_rp), coef(fit_expf)["log_sd(M1)"])
)

And there you go, now the algorithm managed to converge to a solution. Let’s print the results, via the summary() function:

summary(fit)
#> Mixed effects regression model
#> Log likelihood = -1987.056
#> 
#>            Estimate Std. Error       z Pr(>|z|) [95% Conf. Interval]
#> x1          0.98833    0.03654  27.044   0.0000    0.91670   1.05995
#> x2         -0.95154    0.06063 -15.695   0.0000   -1.07037  -0.83272
#> rcs():1     1.85160    0.04301  43.052   0.0000    1.76731   1.93590
#> rcs():2    -0.02074    0.03894  -0.533   0.5943   -0.09707   0.05559
#> rcs():3     0.01426    0.01356   1.052   0.2929   -0.01232   0.04085
#> _cons      -1.46334    0.06294 -23.250   0.0000   -1.58670  -1.33998
#> log_sd(M1) -0.37317    0.03042 -12.267   0.0000   -0.43279  -0.31355
#> 
#> Integration method: Non-adaptive Gauss-Hermite quadrature 
#> Integration points: 7

Remember the results from the Stata fit:

Mixed effects regression model                           Number of obs = 6,000
Log likelihood = -1962.7449
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
stime:       |            
          x1 |   1.008152   .0367602    27.43   0.000      .936103      1.0802
          x2 |    -.96152   .0605554   -15.88   0.000    -1.080206   -.8428335
  M1[centre] |          1          .        .       .            .           .
       _cons |  -1.499115   .0994654   -15.07   0.000    -1.694064   -1.304166
-------------+----------------------------------------------------------------
centre:      |            
      sd(M1) |     .89656   .0673174                      .7738694    1.038702
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

It looks like merlin() in R converged to a slightly different solution (the frailty variance from the R fit, on the original scale, is 0.6885486), with a log-likelihood of -1987.056 (compared to a Stata value of -1962.7449). This is likely because the Stata implementation uses (by default) a more accurate algorithm to perform the numerical integration required to estimate this model.

Again, this is something that we can improve on when fitting the model in R:

refit <- merlin(
  model = list(
    Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE) + M1[centre] * 1
  ),
  levels = "centre",
  family = "rp",
  timevar = "stime",
  data = stmixed_example,
  from = c(coef(fit_rp), coef(fit_expf)["log_sd(M1)"]),
  control = list(ip = 35)
)
summary(refit)
#> Mixed effects regression model
#> Log likelihood = -1963.186
#> 
#>            Estimate Std. Error       z Pr(>|z|) [95% Conf. Interval]
#> x1          1.00677    0.03673  27.412   0.0000    0.93478   1.07875
#> x2         -0.96285    0.06051 -15.911   0.0000   -1.08146  -0.84424
#> rcs():1     1.86352    0.04307  43.263   0.0000    1.77910   1.94795
#> rcs():2    -0.03020    0.03897  -0.775   0.4385   -0.10659   0.04619
#> rcs():3     0.01309    0.01369   0.956   0.3390   -0.01374   0.03993
#> _cons      -1.58984    0.08078 -19.681   0.0000   -1.74817  -1.43151
#> log_sd(M1) -0.19230    0.05454  -3.526   0.0004   -0.29920  -0.08540
#> 
#> Integration method: Non-adaptive Gauss-Hermite quadrature 
#> Integration points: 35

By setting the ip = 35 control argument, we increase the number of integration points used by the algorithm (note that a higher number of integration points leads to higher accuracy, at the cost of additional computational cost). The updated results (variance on the original scale of 0.8250605) are much closer to the Stata fit, both in terms of model coefficients and log-likelihood, which is great.

And there you have it, today we have learned:

  1. How to fit a flexible parametric survival model with a frailty term using the {merlin} package in R,
  2. How to get better starting values if a merlin() model fails to converge or errors out,
  3. How to improve the accuracy of the estimation algorithm by tuning the number of integration points,
  4. The importance of fitting a model with increasing degrees of accuracy to test the robustness of a model fit in terms of the estimation algorithm.

Wrapping up this post, here is some breaking news for those of you that are still reading: updates to the {merlin} R package are currently being worked on, and they will hit your nearest CRAN server soon. Stay tuned for that!


Subscribe to get new post alerts directly in your inbox:

2 thoughts on “Flexible parametric survival analysis with frailty, take two

  1. Hi Alessando, thank you very much for the post. It is exciting to have Merlin in R as well. I must have missed this but I wonder whether other distributions for random effect are available in Merlin (beside Normal)? Can correlation between random effects be included (I’m actually interested whether it is possible to fit spatial models in Merlin 🙂 ) Thank you!

    1. Hi Yuliya,

      {merlin} in R can currently do Gaussian random effects only, if I am not mistaken; the Stata version, however, can do t-distributed random effects (check the “redistribution” option).

      Correlation between random effects can be included, both in R and Stata, depending on the covariance structure: the R version supports diagonal, identity, and unstructured covariance matrices, while the Stata version supports exchangeable covariance matrices in addition to all of the above.

      For spatial models, I am not sure – but if you can fit your problem within the above correlation structures it should be possible!

Leave a Reply

%d bloggers like this: