Hi everyone,
Today we are going to continue learning about flexible parametric survival models with random effects, as a follow-up to a previous post on our blog. Specifically, we are going to replicate the analysis from that post, but with a twist: today, we are going to use the {merlin} package in R.
{merlin} in R is the little brother to its Stata counterpart. Analogously, it can be used to fit a variety of fixed- and mixed-effects models, with a flexible syntax and support for several distributions. It is currently available from the Comprehensive R Archive Network (CRAN), and if you want to check it out, it can easily be installed with the usual install.packages("merlin")
command. A pre-print is also available on arXiv, with several examples of {merlin} in action.
Let’s start by loading the same simulated dataset. For this, we use the {haven} package:
library(haven)
stmixed_example <- read_dta(file = "http://fmwww.bc.edu/repec/bocode/s/stmixed_example1.dta")
head(stmixed_example)
#> # A tibble: 6 × 9
#> centre x1 x2 stime event `_st` `_d` `_t` `_t0`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0.431 0.271 1 1 1 0.271 0
#> 2 1 0 0.529 0.735 1 1 1 0.735 0
#> 3 1 0 0.144 0.821 1 1 1 0.821 0
#> 4 1 1 0.861 0.759 1 1 1 0.759 0
#> 5 1 0 0.502 0.955 0 1 0 0.955 0
#> 6 1 1 0.692 0.232 0 1 0 0.232 0
Note that we need to tidy up the dataset a little bit before we can use it with {merlin}. The first issue is that read_dta()
reads data into a tibble by default, which can cause odd issues sometimes:
class(stmixed_example)
#> [1] "tbl_df" "tbl" "data.frame"
Some column names are also not syntactically valid in R (while being completely okay in Stata, which accepts column names starting with underscores):
names(stmixed_example)
#> [1] "centre" "x1" "x2" "stime" "event" "_st" "_d" "_t"
#> [9] "_t0"
We can fix both of these issues very easily, by turning the data into a standard data.frame
and by using the make.names()
function in base R to create syntactically valid names:
stmixed_example <- as.data.frame(stmixed_example)
names(stmixed_example) <- make.names(names = names(stmixed_example))
The new dataset now looks like this:
head(stmixed_example)
#> centre x1 x2 stime event X_st X_d X_t X_t0
#> 1 1 1 0.4306063 0.2714585 1 1 1 0.2714585 0
#> 2 1 0 0.5285968 0.7353575 1 1 1 0.7353575 0
#> 3 1 0 0.1442798 0.8213144 1 1 1 0.8213144 0
#> 4 1 1 0.8612115 0.7589310 1 1 1 0.7589310 0
#> 5 1 0 0.5016940 0.9547321 0 1 0 0.9547321 0
#> 6 1 1 0.6924560 0.2323809 0 1 0 0.2323809 0
Great, we are now ready to continue.
Remember the columns of interest:
head(stmixed_example[, c("centre", "stime", "event", "x1", "x2")])
#> centre stime event x1 x2
#> 1 1 0.2714585 1 1 0.4306063
#> 2 1 0.7353575 1 0 0.5285968
#> 3 1 0.8213144 1 0 0.1442798
#> 4 1 0.7589310 1 1 0.8612115
#> 5 1 0.9547321 0 0 0.5016940
#> 6 1 0.2323809 0 1 0.6924560
where centre
denotes the clustering variable (for each patient), stime
denotes the survival time, event
the event indicator variable (1
for those that died, 0
otherwise), and two covariates x1
(binary) and x2
(continuous).
Remember also the model that we are trying to fit:
H(tij) = H0(tij) exp(Xijβ+ui)
This is a model on the log-cumulative hazard scale, where we model the baseline cumulative hazard H0 using restricted cubic splines. Model coefficients (β) can be interpreted as log hazard ratios, as usual, but conditionally on the frailty term ui.
Disclaimer: throughout this post, I will mostly focus on the translation of the example to R; please refer to the previous post for more details on interpretation, model specification, and so on.
We can fit this model using {merlin} in R:
library(merlin)
fit <- merlin(
model = list(
Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE) + M1[centre] * 1
),
levels = "centre",
family = "rp",
timevar = "stime",
data = stmixed_example
)
#> Error in stats::optim(par = b, fn = merlin_gf, gml = gml, hessian = TRUE, : function cannot be evaluated at initial parameters
…oh no, an error! It looks like the optimisation routine didn’t like the default starting values. Luckily we can improve the starting values to help the optimiser, as we will see later on, but first, let’s describe each argument of the call to merlin()
:
model
defines a list of (possibly more than one) models, in our case just one. Given that we are fitting a flexible parametric model, we need to remember to include thercs()
term in the linear predictor, denoting the spline for the baseline cumulative hazard (with 3 degrees of freedom). We also need to includeM1[centre]
, denoting the frailty at thecentre
level, and*1
to constrain that parameter to the value of one.levels
is used to define which variables identify clusters in our data.family
is used to define the model family, in this case,"rp"
for Royston-Parmar (flexible parametric models on the log-cumulative hazard scale).timevar
is used to define which variable identifies time.- Finally,
data
is used to pass a dataset tomerlin()
. All variables defined above must be present indata
formerlin()
to be able to proceed.
Okay, let’s go back to the issue of bad starting values. What we could do here is to fit a flexible parametric (without frailty) to get better starting values for the regression coefficients, and a simpler model (e.g., assuming an exponential baseline hazard) with a frailty term to get a better starting value for the frailty variance.
The flexible parametric model without frailty can be fit with:
fit_rp <- merlin(
model = list(
Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE)
),
family = "rp",
timevar = "stime",
data = stmixed_example
)
fit_rp
#> Merlin: mixed-effects model
#> Data: stmixed_example
#>
#> Coefficients:
#> x1 x2 rcs():1 rcs():2 rcs():3 _cons
#> 0.72945 -0.71046 1.67913 0.11093 0.04262 -1.32214
and the exponential model with frailty can be fit with:
fit_expf <- merlin(
model = list(
Surv(stime, event) ~ x1 + x2 + M1[centre] * 1
),
levels = "centre",
family = "exponential",
timevar = "stime",
data = stmixed_example
)
fit_expf
#> Merlin: mixed-effects model
#> Data: stmixed_example
#>
#> Coefficients:
#> x1 x2 _cons log_sd(M1)
#> 0.64988 -0.61154 -0.08712 -0.69603
Now that we have all the ingredients, we can try again to fit the initial model while passing improved starting values via the from
argument:
fit <- merlin(
model = list(
Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE) + M1[centre] * 1
),
levels = "centre",
family = "rp",
timevar = "stime",
data = stmixed_example,
from = c(coef(fit_rp), coef(fit_expf)["log_sd(M1)"])
)
And there you go, now the algorithm managed to converge to a solution. Let’s print the results, via the summary()
function:
summary(fit)
#> Mixed effects regression model
#> Log likelihood = -1987.056
#>
#> Estimate Std. Error z Pr(>|z|) [95% Conf. Interval]
#> x1 0.98833 0.03654 27.044 0.0000 0.91670 1.05995
#> x2 -0.95154 0.06063 -15.695 0.0000 -1.07037 -0.83272
#> rcs():1 1.85160 0.04301 43.052 0.0000 1.76731 1.93590
#> rcs():2 -0.02074 0.03894 -0.533 0.5943 -0.09707 0.05559
#> rcs():3 0.01426 0.01356 1.052 0.2929 -0.01232 0.04085
#> _cons -1.46334 0.06294 -23.250 0.0000 -1.58670 -1.33998
#> log_sd(M1) -0.37317 0.03042 -12.267 0.0000 -0.43279 -0.31355
#>
#> Integration method: Non-adaptive Gauss-Hermite quadrature
#> Integration points: 7
Remember the results from the Stata fit:
Mixed effects regression model Number of obs = 6,000
Log likelihood = -1962.7449
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
stime: |
x1 | 1.008152 .0367602 27.43 0.000 .936103 1.0802
x2 | -.96152 .0605554 -15.88 0.000 -1.080206 -.8428335
M1[centre] | 1 . . . . .
_cons | -1.499115 .0994654 -15.07 0.000 -1.694064 -1.304166
-------------+----------------------------------------------------------------
centre: |
sd(M1) | .89656 .0673174 .7738694 1.038702
------------------------------------------------------------------------------
Warning: Baseline spline coefficients not shown - use ml display
It looks like merlin()
in R converged to a slightly different solution (the frailty variance from the R fit, on the original scale, is 0.6885486), with a log-likelihood of -1987.056 (compared to a Stata value of -1962.7449). This is likely because the Stata implementation uses (by default) a more accurate algorithm to perform the numerical integration required to estimate this model.
Again, this is something that we can improve on when fitting the model in R:
refit <- merlin(
model = list(
Surv(stime, event) ~ x1 + x2 + rcs(stime, df = 3, log = TRUE, event = TRUE) + M1[centre] * 1
),
levels = "centre",
family = "rp",
timevar = "stime",
data = stmixed_example,
from = c(coef(fit_rp), coef(fit_expf)["log_sd(M1)"]),
control = list(ip = 35)
)
summary(refit)
#> Mixed effects regression model
#> Log likelihood = -1963.186
#>
#> Estimate Std. Error z Pr(>|z|) [95% Conf. Interval]
#> x1 1.00677 0.03673 27.412 0.0000 0.93478 1.07875
#> x2 -0.96285 0.06051 -15.911 0.0000 -1.08146 -0.84424
#> rcs():1 1.86352 0.04307 43.263 0.0000 1.77910 1.94795
#> rcs():2 -0.03020 0.03897 -0.775 0.4385 -0.10659 0.04619
#> rcs():3 0.01309 0.01369 0.956 0.3390 -0.01374 0.03993
#> _cons -1.58984 0.08078 -19.681 0.0000 -1.74817 -1.43151
#> log_sd(M1) -0.19230 0.05454 -3.526 0.0004 -0.29920 -0.08540
#>
#> Integration method: Non-adaptive Gauss-Hermite quadrature
#> Integration points: 35
By setting the ip = 35
control argument, we increase the number of integration points used by the algorithm (note that a higher number of integration points leads to higher accuracy, at the cost of additional computational cost). The updated results (variance on the original scale of 0.8250605) are much closer to the Stata fit, both in terms of model coefficients and log-likelihood, which is great.
And there you have it, today we have learned:
- How to fit a flexible parametric survival model with a frailty term using the {merlin} package in R,
- How to get better starting values if a
merlin()
model fails to converge or errors out, - How to improve the accuracy of the estimation algorithm by tuning the number of integration points,
- The importance of fitting a model with increasing degrees of accuracy to test the robustness of a model fit in terms of the estimation algorithm.
Wrapping up this post, here is some breaking news for those of you that are still reading: updates to the {merlin} R package are currently being worked on, and they will hit your nearest CRAN server soon. Stay tuned for that!
Subscribe to get new post alerts directly in your inbox:
Hi Alessando, thank you very much for the post. It is exciting to have Merlin in R as well. I must have missed this but I wonder whether other distributions for random effect are available in Merlin (beside Normal)? Can correlation between random effects be included (I’m actually interested whether it is possible to fit spatial models in Merlin 🙂 ) Thank you!
Hi Yuliya,
{merlin} in R can currently do Gaussian random effects only, if I am not mistaken; the Stata version, however, can do t-distributed random effects (check the “redistribution” option).
Correlation between random effects can be included, both in R and Stata, depending on the covariance structure: the R version supports diagonal, identity, and unstructured covariance matrices, while the Stata version supports exchangeable covariance matrices in addition to all of the above.
For spatial models, I am not sure – but if you can fit your problem within the above correlation structures it should be possible!