# Mixed effects for the level 1 variance function in a multilevel model

In this example, we look at a paper by the late great statistician Harvey Goldstein and colleagues (Goldstein et al., 2017) that proposed a two-level model with complex level 1 variation. This will be a nice illustration of the use of family(user) combined with the family(null) options of merlin to provide an accessible implementation of a novel model.

Let’s start by defining our notation and model structure. We have,

$y_{ij} = X_{1ij}\beta_{1} + Z_{1ij}b_{1j} + \epsilon_{ij}$

where $y_{ij}$ is a repeatedly measured continuous outcome, for the ith patient measured at the jth time point. We have our fixed effect design matrix, $X_{1ij}$, and associated coefficients, $\beta_{1}$, and our random effects design matrix, $Z_{1ij}$, and our normally distributed random effects, $b_{1j}$. We have our standard residual error, level 1 variance function, $\epsilon_{ij}$, where under a standard mixed effects model, would be

$\epsilon_{ij} \sim N(0,\sigma_{e}^{2})$

But what Goldstein and colleagues did was relax the homoscedasticity assumption, and allow the residual variance to vary as a function of both fixed and random effects, where the random effects are defined at the highest level, level 2. They did this by specifying a model for the log of the residual error variance,

$\log (\sigma_{e}^{2}) = X_{2ij}\beta_{2} + Z_{2ij}b_{2i}$

where $X_{2ij}$ is the fixed effects design matrix, with associated coefficients, $\beta_{2}$, and $Z_{2ij}$ is the random effects design matrix, $b_{2j}$, again normally distributed. By modelling on the log scale, we avoid any issues with variances going negative.

We now have two vectors of random effects defined at level 2, which of course may be correlated, and so they specified an overall multivariate normal distribution by stacking the random effects as follows,

$\left(\genfrac{}{}{0pt}{}{\mathbf{b}_{1i}}{\mathbf{b}_{2i}}\right) \sim N \left[\left(\genfrac{}{}{0pt}{}{0}{0}\right) , \left( \genfrac{}{}{0pt}{}{\mathbf{\Sigma}_{b_{1}}}{\mathbf{\Sigma}_{b_{1}b_{2}}} \genfrac{}{}{0pt}{}{}{\mathbf{\Sigma}_{b_{2}}} \right) \right]$

They estimated their proposal with a Bayesian approach utilising Markov Chain Monte Carlo methods. Here we’re going to bring this model into merlin‘s modelling framework, and hence estimate it using maximum likelihood. As with many of our tutorials, we’ll start by showing you how to simulate such data.

Let’s simulate data for 300 patients, under the following model,

$y_{ij} = \beta_{01} + b_{01i} + \beta_{11} X_{1ij}+ \epsilon_{ij}$

$\epsilon_{ij} \sim N(0,\sigma_{e}^{2})$

$\log (\sigma_{e}^{2}) = \beta_{02} + b_{02i}$

$\left(\genfrac{}{}{0pt}{}{b_{01i}}{b_{02i}}\right) \sim N \left[\left(\genfrac{}{}{0pt}{}{0}{0}\right) , \left( \genfrac{}{}{0pt}{}{\sigma^{2}_{1}}{0} \genfrac{}{}{0pt}{}{0}{\sigma^{2}_{2}} \right) \right]$

where we will observe a repeatedly measured biomarker, $y_{ij}$, with a random intercept, $b_{01i}$, and the residual error variance dependent on a random effect which is defined at the patient level, $b_{02i}$. We start by generating a unique id variable for our 300 patients,

. clear all

. set seed 24575240

. set obs 300
Number of observations (_N) was 0, now 300.

. gen id = _n

We’ve declared 300 observations, and used the subscript notation _n to generate our id variable which now contains a vector of 1..300. We now need to define our variables at the patient id level. Let’s define our two random effects, which we assume are normally distributed, and independent from each other.

. gen b21 = rnormal(0,0.5)

. gen b11 = rnormal(0,1)

where we’ll assume b11 is the random intercept for our biomarker trajectory, and b21 is the random intercept for our residual variance linear predictor. If we’d wanted to generate correlated random effects we could have used drawnorm. We now need the standard deviation of my residual error, $\sigma_{e}$, which we can generate with

. gen sdre = sqrt(exp(b21))

Now let’s assume we observe 5 measurements of the biomarker per patient, and that they are recorded at 0, 1, 2, 3, 4 time points.

. expand 5
(1,200 observations created)

. bys id: gen time = _n-1

Finally let’s generate our observed biomarker, $y_{ij}$. We’ll assume $\beta_{01}=0.5$, $\beta_{11}=0.1$, and we can pass our residual error standard deviation to the rnormal() function to generate our level 1 variability,

. gen y = 0.5 + b11 + 0.1*time + rnormal(0,sdre)

Now we have our data, we can fit our model…almost. In terms of understanding, it’s easiest if we define our merlin command that we will use, first.

merlin (y time M1[id]@1, family(user, llfunction(lev1_logl)))  ///
(M2[id]@1, family(null)) 

Hopefully you’ve looked at some of the simpler example uses of merlin, as this one uses two of the more complex syntax aspects, family(user) and family(null). Even though our distributional model is Gaussian, we can’t use the built-in distribution family(gaussian), because we want to have a residual variance function which isn’t just a scalar. What we really want is for the log of the residual variance to have its own complex linear predictor, just as if it were its own merlin model. This is where family(null) comes in. This lets us define and evaluate a complex linear predictor, but it tells merlin that that model definition doesn’t contribute to the overall log-likelihood. We can then use our complex linear predictor in a user-defined log-likelihood function, as follows,

. mata:
------------------------------------------------- mata (type end to exit) --------
: real matrix lev1_logl(gml,| S, H)
> {
>      y        = merlin_util_depvar(gml)               //response
>      linpred1 = merlin_util_xzb(gml)                  //main lin. pred.
>      varresid = exp(merlin_util_xzb_mod(gml,2))       //lev1 lin. pred
>      return(lnnormalden(y,linpred1,sqrt(varresid)))   //logl
> }
note: argument S unused.
note: argument H unused.

: end
----------------------------------------------------------------------------------

The important thing to note is the use of the utility function merlin_util_xzb_mod(), which let’s us extract the complex linear predictor of a named model (see the help files for merlin for more details), as opposed to merlin_util_xzb() which extracts the complex linear predictor of the current model, i.e. the model which corresponds to where the family(user,...) was entered. In the above case merlin_util_xzb(gml) = merlin_util_xzb_mod(gml,1).

So let’s now fit the model:

. merlin (y time M1[id]@1, family(user, llfunction(lev1_logl)))   ///
>        (M2[id]@1,        family(null))

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -2517.3658  (not concave)
Iteration 1:   log likelihood = -2433.9253
Iteration 2:   log likelihood = -2422.6111
Iteration 3:   log likelihood = -2419.4193
Iteration 4:   log likelihood = -2419.4134
Iteration 5:   log likelihood = -2419.4131
Iteration 6:   log likelihood = -2419.4131

Mixed effects regression model                           Number of obs = 1,500
Log likelihood = -2419.4131
------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
y:           |
time |   .0944808   .0174975     5.40   0.000     .0601863    .1287754
M1[id] |          1          .        .       .            .           .
_cons |   .6403802    .074414     8.61   0.000     .4945315    .7862289
-------------+----------------------------------------------------------------
null:        |
M2[id] |          1          .        .       .            .           .
_cons |  -.1008029   .0557784    -1.81   0.071    -.2101266    .0085209
-------------+----------------------------------------------------------------
id:          |
sd(M1) |   1.063407   .0541458                      .9624069    1.175007
sd(M2) |   .5488374     .06484                      .4353937    .6918394
------------------------------------------------------------------------------

And we get good estimates of our true parameters described above. If you want to check it for bias, then increase the sample size to something large, say n = 10000, to see that everything is working as expected.

Now we can extend the level 1 variance function model in a number of ways, simply through the complex linear predictor, for example through extending to higher levels, time-dependent effects and non-linear covariate effects, as discussed in Goldstein et al. (2017). All of these extensions can be done without changing a single thing in our lev1_logl() function…we think that’s pretty powerful. We can of course also incorporate a non-linear mixed effects model, and still model the level 1 variance function as above, for example. Job done.