# Probabilistic sensitivity analysis and survival models

Today we’re going to take a little look into probabilistic sensitivity analysis (PSA), and how it can be implemented within the context of survival analysis. Now PSA is used extensively in health economic modelling, where a particular parameter (or parameters) of interest, are altered or varied, to represent different scenarios and levels of variation. We can then assess the impact of such changes on various predictions from our model, such as survival proportions or (restricted) mean survival time. Or, from a micro-simulation perspective, we are also interested in generating individual survival times from our specified/adapted models – this becomes more relevant within a multi-state context, which is a natural extension.

### The mechanics of PSA

To conduct PSA, we need to have an estimated survival model, and have the ability to alter its parameters.

We’ll start with the `brcancer` dataset, which contains recurrence free survival times on 686 patients with breast cancer:

``````. webuse brcancer, clear
(German breast cancer data)

. stset rectime, f(censrec) scale(365.25)

Survival-time data settings

Failure event: censrec!=0 & censrec<.
Observed time interval: (0, rectime]
Exit on or before: failure
Time for analysis: time/365.25

--------------------------------------------------------------------------
686  total observations
0  exclusions
--------------------------------------------------------------------------
686  observations remaining, representing
299  failures in single-record/single-failure data
2,111.978  total analysis time at risk and under observation
At risk from t =         0
Earliest observed entry t =         0
Last observed exit t =  7.279945``````

We use `stmerlin` to fit a proportional hazards Weibull model, adjusting for hormonal therapy.

``````. stmerlin hormon, distribution(weibull) nolog

Fitting full model:

Survival model                                             Number of obs = 686
Log likelihood = -867.82212
------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_t:          |
hormon |  -.3932404   .1248267    -3.15   0.002    -.6378961   -.1485846
_cons |  -2.195132   .1093755   -20.07   0.000    -2.409504    -1.98076
log(gamma) |   .2509973   .0496958     5.05   0.000     .1535953    .3483993
------------------------------------------------------------------------------``````

and we get a log hazard ratio of -0.393. Now I’m going to stick to using the `predictms` function from the `multistate` package in this post, so that you can hopefully see how to extend it to a more general multi-state setting. `predictms` doesn’t necessarily need a complex multistate setting to work – we can use the `singleevent` option to use its prediction engine with a single event, standard survival analysis (this actually gives you a lot of extended predictions that standard survival commands don’t). We first store our model object:

``. estimates store m1``

mainly because I prefer the `models()` syntax of `predictms`, but also because it’s easier to extend to more complex settings. I create a time variable at which to calculate predictions at,

``````. gen tvar = 5 in 1
(685 missing values generated)``````

I’m going to predict the survival probability for a patient in the treated group, at t = 5.

We call `predictms`, predicting the transition probabilities for a patient in the hormonal therapy group.

``````. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   //  time variable``````

Nice and simple. We get two variables:

``````. list _prob_at1* in 1, abbrev(15)

+-------------------------------+
| _prob_at1_1_1   _prob_at1_1_2 |
|-------------------------------|
1. |     .55174536       .44825464 |
+-------------------------------+``````

which is the probability of still being in the starting state, i.e. alive and recurrence free, and the probability of experiencing a recurrence, at 5 years.

Now let’s say we want to alter my log hazard ratio to assess its impact. Well, I want to set it to be -1, and recalculate my predicted survival at 5 years.

First I’ll make a copy of my estimated coefficient vector, which is stored in `e(b)`.

``. matrix b1 = e(b)``

My log hazard ratio is in the first element, so I’ll replace it with a -1, and check it’s worked:

``````. matrix b1[1,1] = -1

. matrix list b1

b1[1,3]
_cmp_1_1_1:       cons1:      dap1_1:
_cons        _cons        _cons
y1           -1   -2.1951318    .25099732``````

All good. Now the tricky part. There are lots of options for Stata’s maximum likelihood (`ml`) engine, that most of the time feels like only I know about…but not many spend their time delving into Stata’s source code. But everything here is fully documented! Essentially I want a Weibull model object, with coefficients set to the values in `b1`. So, we can pass those as initial values using the `from()` option, but tell it not to proceed with estimation beyond the initial setup step, using `iter(0)`:

``````. stmerlin hormon, dist(weib) from(b1) iter(0)

Fitting full model:
Iteration 0:   log likelihood = -882.09826
convergence not achieved

Survival model                                             Number of obs = 686
Log likelihood = -882.09826
------------------------------------------------------------------------------
| Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_t:          |
hormon |         -1   .1564562    -6.39   0.000    -1.306648   -.6933515
_cons |  -2.195132   .1186107   -18.51   0.000    -2.427604   -1.962659
log(gamma) |   .2509973   .0566004     4.43   0.000     .1400626    .3619321
------------------------------------------------------------------------------``````

which gives us exactly what we want. We of course get the `convergence not achieved` message, but our ‘estimated’ parameter vector is set to the values we specified! Check in `e(b)` as well:

``````. matrix list e(b)

e(b)[1,3]
_cmp_1_1_1:       cons1:      dap1_1:
_cons        _cons        _cons
y1           -1   -2.1951318    .25099732``````

Note, we could have used Ben Jann’s `erepost` command to make this more elegant, but I prefer to show the steps. So now we have our `stmerlin` model object, with our new hormone coefficient, we can simply store our new model object, and run `predictms`:

``````. estimates store m1

. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   //  time variable``````

`predictms` doesn’t care if your model converged or not, it simply uses the values it finds in `e(b)`. Even if I hadn’t fitted a first model, I could’ve just passed a matrix of my own values to `stmerlin`, the only thing to make sure you have in memory is some form of survival data, as it will check your data is `stset`, and a variable called `hormon`. Those things can be easily created with some dummy data.

We can now generalise this to assess the assumption that our treatment effect comes from $N(-1,0.1^2)$, by making a large number of draws from the normal distribution, and calculating survival at 5 years for each run, as follows:

``````. local Nsim = 200

. capture set obs `Nsim'

. //to store our predictions
. gen s5 = .
(686 missing values generated)

. //loop over Nsim draws
. set seed 13490        // <- don't forget!

. forvalues i=1/`Nsim' {
2.         local draw = rnormal(-1,0.1)
3.         mat b1[1,1] = `draw'
4.         qui stmerlin hormon, dist(weib) from(b1) iter(0)
5.         //double check our hormon coefficient matches the random draw
.         assert [_cmp_1_1_1][_cons]==`draw'
6.         //store new model object
.         estimates store m1
7.         //predict survival at 5 years
.         predictms , singleevent models(m1) at1(hormon 1)        ///
>                         probability timevar(tvar)
8.         //store my new estimate
.         qui replace s5 = _prob_at1_1_1[1] if _n==`i'
9. }``````

which gives us our distribution of estimates of S(5):

``````. summarize s5

Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
s5 |        200    .7194541    .0239843   .6604151   .7868063``````

Hopefully that shows the basis for how to manipulate a fitted model in Stata. Any parameter can be altered, and any prediction can be obtained, I simply tricked `stmerlin` into turning it back into an `estimates` object, which means we can use a post-estimation function, such as `predict` or `predictms`.

### Micro-simulation

Now `predictms` calculates analytic predictions for survival, competing risks and illness-death structures – and uses large-sample simulation to obtain predictions from all others. But, we can use simulation if we so wish! If we’re interested in conducting a micro-simulation and utilising how long each observation spends recurrence free (clearly relevant in a health economic context), well we can add the `simulate` option to invoke the use of simulation, and specify the `save(filename)` to save our simulated survival times. The default number of simulations is `n(100000)`, which of course can be changed.

Let’s compare analytic prediction, as before,

``````. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   //  time variable

. list _prob* in 1

+-----------------------+
| _prob_a~1   _prob_a~2 |
|-----------------------|
1. | .74514895   .25485105 |
+-----------------------+``````

and now using simulation,

``````. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   ///  time variable
>                 simulate        /// use simulation
>                 seed(3418736)   //  reproducibility

. list _prob* in 1

+---------------------+
| _prob_~1   _prob_~2 |
|---------------------|
1. |   .74367     .25633 |
+---------------------+``````

Our estimates are pretty close as they should be. We should always increase `n()` until predictions stabilise. Now, if we want access to the individual simulated times, we add the `save()` option,

``````. predictms ,     singleevent             /// standard survival setting
>                 models(m1)              /// fitted model
>                 at1(hormon 1)           /// covariate pattern
>                 probability             /// request probabilities
>                 timevar(tvar)           ///  time variable
>                 simulate                /// use simulation
>                 seed(3418736)           /// reproducibility
>                 save(sims,replace)      //  save simulated times

. list _prob* in 1

+---------------------+
| _prob_~1   _prob_~2 |
|---------------------|
1. |   .74367     .25633 |
+---------------------+``````

Let’s take a look:

``````. use sims, clear

. list _state* _time* in 1/5

+----------------------------------------+
| _state1   _state2   _time1      _time2 |
|----------------------------------------|
1. |       1         1        0           5 |
2. |       1         2        0   .92867371 |
3. |       1         1        0           5 |
4. |       1         1        0           5 |
5. |       1         1        0           5 |
+----------------------------------------+``````

We get the starting and finishing state, and the enter and event/censoring times. To calculate our survival probability of interest in this case, we can simply count how many observations are still in the initial state, and divide by the total.

``````. count if _state2==1
74,367

. di "S(5) = " `r(N)'/_N
S(5) = .74367``````

If we added confidence intervals with the `ci` option and use the `bootstrap` option, we would save each set of simulated survival times generated to calculate our confidence intervals using parametric bootstrap…which uses the same steps as above!

All of the above extends to a more general multi-state setting – simply pass more models to `predictms`, specify your transition matrix, and you are good to go.