Probabilistic sensitivity analysis and survival models
Today we’re going to take a little look into probabilistic sensitivity analysis (PSA), and how it can be implemented within the context of survival analysis. Now PSA is used extensively in health economic modelling, where a particular parameter (or parameters) of interest, are altered or varied, to represent different scenarios and levels of variation. We can then assess the impact of such changes on various predictions from our model, such as survival proportions or (restricted) mean survival time. Or, from a micro-simulation perspective, we are also interested in generating individual survival times from our specified/adapted models – this becomes more relevant within a multi-state context, which is a natural extension.
The mechanics of PSA
To conduct PSA, we need to have an estimated survival model, and have the ability to alter its parameters.
We’ll start with the brcancer
dataset, which contains recurrence free survival times on 686 patients with breast cancer:
. webuse brcancer, clear
(German breast cancer data)
. stset rectime, f(censrec) scale(365.25)
Survival-time data settings
Failure event: censrec!=0 & censrec<.
Observed time interval: (0, rectime]
Exit on or before: failure
Time for analysis: time/365.25
--------------------------------------------------------------------------
686 total observations
0 exclusions
--------------------------------------------------------------------------
686 observations remaining, representing
299 failures in single-record/single-failure data
2,111.978 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 7.279945
We use stmerlin
to fit a proportional hazards Weibull model, adjusting for hormonal therapy.
. stmerlin hormon, distribution(weibull) nolog
Fitting full model:
Survival model Number of obs = 686
Log likelihood = -867.82212
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_t: |
hormon | -.3932404 .1248267 -3.15 0.002 -.6378961 -.1485846
_cons | -2.195132 .1093755 -20.07 0.000 -2.409504 -1.98076
log(gamma) | .2509973 .0496958 5.05 0.000 .1535953 .3483993
------------------------------------------------------------------------------
and we get a log hazard ratio of -0.393. Now I’m going to stick to using the predictms
function from the multistate
package in this post, so that you can hopefully see how to extend it to a more general multi-state setting. predictms
doesn’t necessarily need a complex multistate setting to work – we can use the singleevent
option to use its prediction engine with a single event, standard survival analysis (this actually gives you a lot of extended predictions that standard survival commands don’t). We first store our model object:
. estimates store m1
mainly because I prefer the models()
syntax of predictms
, but also because it’s easier to extend to more complex settings. I create a time variable at which to calculate predictions at,
. gen tvar = 5 in 1
(685 missing values generated)
I’m going to predict the survival probability for a patient in the treated group, at t = 5.
We call predictms
, predicting the transition probabilities for a patient in the hormonal therapy group.
. predictms , singleevent /// standard survival setting
> models(m1) /// fitted model
> at1(hormon 1) /// covariate pattern
> probability /// request probabilities
> timevar(tvar) // time variable
Nice and simple. We get two variables:
. list _prob_at1* in 1, abbrev(15)
+-------------------------------+
| _prob_at1_1_1 _prob_at1_1_2 |
|-------------------------------|
1. | .55174536 .44825464 |
+-------------------------------+
which is the probability of still being in the starting state, i.e. alive and recurrence free, and the probability of experiencing a recurrence, at 5 years.
Now let’s say we want to alter my log hazard ratio to assess its impact. Well, I want to set it to be -1, and recalculate my predicted survival at 5 years.
First I’ll make a copy of my estimated coefficient vector, which is stored in e(b)
.
. matrix b1 = e(b)
My log hazard ratio is in the first element, so I’ll replace it with a -1, and check it’s worked:
. matrix b1[1,1] = -1
. matrix list b1
b1[1,3]
_cmp_1_1_1: cons1: dap1_1:
_cons _cons _cons
y1 -1 -2.1951318 .25099732
All good. Now the tricky part. There are lots of options for Stata’s maximum likelihood (ml
) engine, that most of the time feels like only I know about…but not many spend their time delving into Stata’s source code. But everything here is fully documented! Essentially I want a Weibull model object, with coefficients set to the values in b1
. So, we can pass those as initial values using the from()
option, but tell it not to proceed with estimation beyond the initial setup step, using iter(0)
:
. stmerlin hormon, dist(weib) from(b1) iter(0)
Fitting full model:
Iteration 0: log likelihood = -882.09826
convergence not achieved
Survival model Number of obs = 686
Log likelihood = -882.09826
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_t: |
hormon | -1 .1564562 -6.39 0.000 -1.306648 -.6933515
_cons | -2.195132 .1186107 -18.51 0.000 -2.427604 -1.962659
log(gamma) | .2509973 .0566004 4.43 0.000 .1400626 .3619321
------------------------------------------------------------------------------
which gives us exactly what we want. We of course get the convergence not achieved
message, but our ‘estimated’ parameter vector is set to the values we specified! Check in e(b)
as well:
. matrix list e(b)
e(b)[1,3]
_cmp_1_1_1: cons1: dap1_1:
_cons _cons _cons
y1 -1 -2.1951318 .25099732
Note, we could have used Ben Jann’s erepost
command to make this more elegant, but I prefer to show the steps. So now we have our stmerlin
model object, with our new hormone coefficient, we can simply store our new model object, and run predictms
:
. estimates store m1
. predictms , singleevent /// standard survival setting
> models(m1) /// fitted model
> at1(hormon 1) /// covariate pattern
> probability /// request probabilities
> timevar(tvar) // time variable
predictms
doesn’t care if your model converged or not, it simply uses the values it finds in e(b)
. Even if I hadn’t fitted a first model, I could’ve just passed a matrix of my own values to stmerlin
, the only thing to make sure you have in memory is some form of survival data, as it will check your data is stset
, and a variable called hormon
. Those things can be easily created with some dummy data.
We can now generalise this to assess the assumption that our treatment effect comes from \(N(-1,0.1^{2})\), by making a large number of draws from the normal distribution, and calculating survival at 5 years for each run, as follows:
. local Nsim = 200
. capture set obs `Nsim'
. //to store our predictions
. gen s5 = .
(686 missing values generated)
. //loop over Nsim draws
. set seed 13490 // <- don't forget!
. forvalues i=1/`Nsim' {
2. local draw = rnormal(-1,0.1)
3. mat b1[1,1] = `draw'
4. qui stmerlin hormon, dist(weib) from(b1) iter(0)
5. //double check our hormon coefficient matches the random draw
. assert [_cmp_1_1_1][_cons]==`draw'
6. //store new model object
. estimates store m1
7. //predict survival at 5 years
. predictms , singleevent models(m1) at1(hormon 1) ///
> probability timevar(tvar)
8. //store my new estimate
. qui replace s5 = _prob_at1_1_1[1] if _n==`i'
9. }
which gives us our distribution of estimates of S(5):
. summarize s5
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
s5 | 200 .7194541 .0239843 .6604151 .7868063
Hopefully that shows the basis for how to manipulate a fitted model in Stata. Any parameter can be altered, and any prediction can be obtained, I simply tricked stmerlin
into turning it back into an estimates
object, which means we can use a post-estimation function, such as predict
or predictms
.
Micro-simulation
Now predictms
calculates analytic predictions for survival, competing risks and illness-death structures – and uses large-sample simulation to obtain predictions from all others. But, we can use simulation if we so wish! If we’re interested in conducting a micro-simulation and utilising how long each observation spends recurrence free (clearly relevant in a health economic context), well we can add the simulate
option to invoke the use of simulation, and specify the save(filename)
to save our simulated survival times. The default number of simulations is n(100000)
, which of course can be changed.
Let’s compare analytic prediction, as before,
. predictms , singleevent /// standard survival setting
> models(m1) /// fitted model
> at1(hormon 1) /// covariate pattern
> probability /// request probabilities
> timevar(tvar) // time variable
. list _prob* in 1
+-----------------------+
| _prob_a~1 _prob_a~2 |
|-----------------------|
1. | .74514895 .25485105 |
+-----------------------+
and now using simulation,
. predictms , singleevent /// standard survival setting
> models(m1) /// fitted model
> at1(hormon 1) /// covariate pattern
> probability /// request probabilities
> timevar(tvar) /// time variable
> simulate /// use simulation
> seed(3418736) // reproducibility
. list _prob* in 1
+---------------------+
| _prob_~1 _prob_~2 |
|---------------------|
1. | .74367 .25633 |
+---------------------+
Our estimates are pretty close as they should be. We should always increase n()
until predictions stabilise. Now, if we want access to the individual simulated times, we add the save()
option,
. predictms , singleevent /// standard survival setting
> models(m1) /// fitted model
> at1(hormon 1) /// covariate pattern
> probability /// request probabilities
> timevar(tvar) /// time variable
> simulate /// use simulation
> seed(3418736) /// reproducibility
> save(sims,replace) // save simulated times
. list _prob* in 1
+---------------------+
| _prob_~1 _prob_~2 |
|---------------------|
1. | .74367 .25633 |
+---------------------+
Let’s take a look:
. use sims, clear
. list _state* _time* in 1/5
+----------------------------------------+
| _state1 _state2 _time1 _time2 |
|----------------------------------------|
1. | 1 1 0 5 |
2. | 1 2 0 .92867371 |
3. | 1 1 0 5 |
4. | 1 1 0 5 |
5. | 1 1 0 5 |
+----------------------------------------+
We get the starting and finishing state, and the enter and event/censoring times. To calculate our survival probability of interest in this case, we can simply count how many observations are still in the initial state, and divide by the total.
. count if _state2==1
74,367
. di "S(5) = " `r(N)'/_N
S(5) = .74367
If we added confidence intervals with the ci
option and use the bootstrap
option, we would save each set of simulated survival times generated to calculate our confidence intervals using parametric bootstrap…which uses the same steps as above!
All of the above extends to a more general multi-state setting – simply pass more models to predictms
, specify your transition matrix, and you are good to go.
Videos
State-of-the-art statistical models for modern HTA
Videos
Multilevel (hierarchical) survival models: Estimation, prediction, interpretation
Statistical Primers