In this example, we look at a paper by the late great statistician Harvey Goldstein and colleagues (Goldstein et al., 2017) that proposed a two-level model with complex level 1 variation. This will be a nice illustration of the use of `family(user)`

combined with the `family(null)`

options of `merlin`

to provide an accessible implementation of a novel model.

Let’s start by defining our notation and model structure. We have,

where is a repeatedly measured continuous outcome, for the ith patient measured at the jth time point. We have our fixed effect design matrix, , and associated coefficients, , and our random effects design matrix, , and our normally distributed random effects, . We have our standard residual error, level 1 variance function, , where under a standard mixed effects model, would be

But what Goldstein and colleagues did was relax the homoscedasticity assumption, and allow the residual variance to vary as a function of both fixed and random effects, where the random effects are defined at the highest level, level 2. They did this by specifying a model for the log of the residual error variance,

where is the fixed effects design matrix, with associated coefficients, , and is the random effects design matrix, , again normally distributed. By modelling on the log scale, we avoid any issues with variances going negative.

We now have two vectors of random effects defined at level 2, which of course may be correlated, and so they specified an overall multivariate normal distribution by stacking the random effects as follows,

They estimated their proposal with a Bayesian approach utilising Markov Chain Monte Carlo methods. Here we’re going to bring this model into `merlin`

‘s modelling framework, and hence estimate it using maximum likelihood. As with many of our tutorials, we’ll start by showing you how to simulate such data.

Let’s simulate data for 300 patients, under the following model,

where we will observe a repeatedly measured biomarker, , with a random intercept, , and the residual error variance dependent on a random effect which is defined at the patient level, . We start by generating a unique `id`

variable for our 300 patients,

```
. clear all
. set seed 24575240
. set obs 300
Number of observations (_N) was 0, now 300.
. gen id = _n
```

We’ve declared 300 observations, and used the subscript notation `_n`

to generate our `id`

variable which now contains a vector of 1..300. We now need to define our variables at the patient `id`

level. Let’s define our two random effects, which we assume are normally distributed, and independent from each other.

```
. gen b21 = rnormal(0,0.5)
. gen b11 = rnormal(0,1)
```

where we’ll assume `b11`

is the random intercept for our biomarker trajectory, and `b21`

is the random intercept for our residual variance linear predictor. If we’d wanted to generate correlated random effects we could have used `drawnorm`

. We now need the standard deviation of my residual error, $\sigma_{e}$, which we can generate with

`. gen sdre = sqrt(exp(b21))`

Now let’s assume we observe 5 measurements of the biomarker per patient, and that they are recorded at 0, 1, 2, 3, 4 time points.

```
. expand 5
(1,200 observations created)
. bys id: gen time = _n-1
```

Finally let’s generate our observed biomarker, . We’ll assume , , and we can pass our residual error standard deviation to the `rnormal()`

function to generate our level 1 variability,

`. gen y = 0.5 + b11 + 0.1*time + rnormal(0,sdre)`

Now we have our data, we can fit our model…almost. In terms of understanding, it’s easiest if we define our `merlin`

command that we will use, first.

```
merlin (y time M1[id]@1, family(user, llfunction(lev1_logl))) ///
(M2[id]@1, family(null))
```

Hopefully you’ve looked at some of the simpler example uses of `merlin`

, as this one uses two of the more complex syntax aspects, `family(user)`

and `family(null)`

. Even though our distributional model is Gaussian, we can’t use the built-in distribution `family(gaussian)`

, because we want to have a residual variance function which isn’t just a scalar. What we really want is for the log of the residual variance to have its own complex linear predictor, just as if it were its own `merlin`

model. This is where `family(null)`

comes in. This lets us define and evaluate a complex linear predictor, but it tells `merlin`

that that model definition doesn’t contribute to the overall log-likelihood. We can then use our complex linear predictor in a user-defined log-likelihood function, as follows,

```
. mata:
------------------------------------------------- mata (type end to exit) --------
: real matrix lev1_logl(gml,| S, H)
> {
> y = merlin_util_depvar(gml) //response
> linpred1 = merlin_util_xzb(gml) //main lin. pred.
> varresid = exp(merlin_util_xzb_mod(gml,2)) //lev1 lin. pred
> return(lnnormalden(y,linpred1,sqrt(varresid))) //logl
> }
note: argument S unused.
note: argument H unused.
: end
----------------------------------------------------------------------------------
```

The important thing to note is the use of the utility function `merlin_util_xzb_mod()`

, which let’s us extract the complex linear predictor of a named *model* (see the help files for `merlin`

for more details), as opposed to `merlin_util_xzb()`

which extracts the complex linear predictor of the current *model*, i.e. the model which corresponds to where the `family(user,...)`

was entered. In the above case `merlin_util_xzb(gml) = merlin_util_xzb_mod(gml,1)`

.

So let’s now fit the model:

```
. merlin (y time M1[id]@1, family(user, llfunction(lev1_logl))) ///
> (M2[id]@1, family(null))
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -2517.3658 (not concave)
Iteration 1: log likelihood = -2433.9253
Iteration 2: log likelihood = -2422.6111
Iteration 3: log likelihood = -2419.4193
Iteration 4: log likelihood = -2419.4134
Iteration 5: log likelihood = -2419.4131
Iteration 6: log likelihood = -2419.4131
Mixed effects regression model Number of obs = 1,500
Log likelihood = -2419.4131
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
y: |
time | .0944808 .0174975 5.40 0.000 .0601863 .1287754
M1[id] | 1 . . . . .
_cons | .6403802 .074414 8.61 0.000 .4945315 .7862289
-------------+----------------------------------------------------------------
null: |
M2[id] | 1 . . . . .
_cons | -.1008029 .0557784 -1.81 0.071 -.2101266 .0085209
-------------+----------------------------------------------------------------
id: |
sd(M1) | 1.063407 .0541458 .9624069 1.175007
sd(M2) | .5488374 .06484 .4353937 .6918394
------------------------------------------------------------------------------
```

And we get good estimates of our true parameters described above. If you want to check it for bias, then increase the sample size to something large, say n = 10000, to see that everything is working as expected.

Now we can extend the level 1 variance function model in a number of ways, simply through the complex linear predictor, for example through extending to higher levels, time-dependent effects and non-linear covariate effects, as discussed in Goldstein et al. (2017). All of these extensions can be done without changing a single thing in our `lev1_logl()`

function…we think that’s pretty powerful. We can of course also incorporate a non-linear mixed effects model, and still model the level 1 variance function as above, for example. Job done.

Subscribe to get new post alerts directly in your inbox: