Probabilistic sensitivity analysis and survival models

Tutorials

Today we’re going to take a little look into probabilistic sensitivity analysis (PSA), and how it can be implemented within the context of survival analysis. Now PSA is used extensively in health economic modelling, where a particular parameter (or parameters) of interest, are altered or varied, to represent different scenarios and levels of variation. We can then assess the impact of such changes on various predictions from our model, such as survival proportions or (restricted) mean survival time. Or, from a micro-simulation perspective, we are also interested in generating individual survival times from our specified/adapted models – this becomes more relevant within a multi-state context, which is a natural extension.

The mechanics of PSA

To conduct PSA, we need to have an estimated survival model, and have the ability to alter its parameters.

We’ll start with the brcancer dataset, which contains recurrence free survival times on 686 patients with breast cancer:

. webuse brcancer, clear
(German breast cancer data)

. stset rectime, f(censrec) scale(365.25)

Survival-time data settings

         Failure event: censrec!=0 & censrec<.
Observed time interval: (0, rectime]
     Exit on or before: failure
     Time for analysis: time/365.25

--------------------------------------------------------------------------
        686  total observations
          0  exclusions
--------------------------------------------------------------------------
        686  observations remaining, representing
        299  failures in single-record/single-failure data
  2,111.978  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =  7.279945

We use stmerlin to fit a proportional hazards Weibull model, adjusting for hormonal therapy.

. stmerlin hormon, distribution(weibull) nolog

Fitting full model:

Survival model                                             Number of obs = 686
Log likelihood = -867.82212
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_t:          |            
      hormon |  -.3932404   .1248267    -3.15   0.002    -.6378961   -.1485846
       _cons |  -2.195132   .1093755   -20.07   0.000    -2.409504    -1.98076
  log(gamma) |   .2509973   .0496958     5.05   0.000     .1535953    .3483993
------------------------------------------------------------------------------

and we get a log hazard ratio of -0.393. Now I’m going to stick to using the predictms function from the multistate package in this post, so that you can hopefully see how to extend it to a more general multi-state setting. predictms doesn’t necessarily need a complex multistate setting to work – we can use the singleevent option to use its prediction engine with a single event, standard survival analysis (this actually gives you a lot of extended predictions that standard survival commands don’t). We first store our model object:

. estimates store m1

mainly because I prefer the models() syntax of predictms, but also because it’s easier to extend to more complex settings. I create a time variable at which to calculate predictions at,

. gen tvar = 5 in 1
(685 missing values generated)

I’m going to predict the survival probability for a patient in the treated group, at t = 5.

We call predictms, predicting the transition probabilities for a patient in the hormonal therapy group.

. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   //  time variable

Nice and simple. We get two variables:

. list _prob_at1* in 1, abbrev(15)

     +-------------------------------+
     | _prob_at1_1_1   _prob_at1_1_2 |
     |-------------------------------|
  1. |     .55174536       .44825464 |
     +-------------------------------+

which is the probability of still being in the starting state, i.e. alive and recurrence free, and the probability of experiencing a recurrence, at 5 years.

Now let’s say we want to alter my log hazard ratio to assess its impact. Well, I want to set it to be -1, and recalculate my predicted survival at 5 years.

First I’ll make a copy of my estimated coefficient vector, which is stored in e(b).

. matrix b1 = e(b)

My log hazard ratio is in the first element, so I’ll replace it with a -1, and check it’s worked:

. matrix b1[1,1] = -1

. matrix list b1

b1[1,3]
     _cmp_1_1_1:       cons1:      dap1_1:
          _cons        _cons        _cons
y1           -1   -2.1951318    .25099732

All good. Now the tricky part. There are lots of options for Stata’s maximum likelihood (ml) engine, that most of the time feels like only I know about…but not many spend their time delving into Stata’s source code. But everything here is fully documented! Essentially I want a Weibull model object, with coefficients set to the values in b1. So, we can pass those as initial values using the from() option, but tell it not to proceed with estimation beyond the initial setup step, using iter(0):

. stmerlin hormon, dist(weib) from(b1) iter(0)

Fitting full model:
Iteration 0:   log likelihood = -882.09826  
convergence not achieved

Survival model                                             Number of obs = 686
Log likelihood = -882.09826
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_t:          |            
      hormon |         -1   .1564562    -6.39   0.000    -1.306648   -.6933515
       _cons |  -2.195132   .1186107   -18.51   0.000    -2.427604   -1.962659
  log(gamma) |   .2509973   .0566004     4.43   0.000     .1400626    .3619321
------------------------------------------------------------------------------

which gives us exactly what we want. We of course get the convergence not achieved message, but our ‘estimated’ parameter vector is set to the values we specified! Check in e(b) as well:

. matrix list e(b)

e(b)[1,3]
     _cmp_1_1_1:       cons1:      dap1_1:
          _cons        _cons        _cons
y1           -1   -2.1951318    .25099732

Note, we could have used Ben Jann’s erepost command to make this more elegant, but I prefer to show the steps. So now we have our stmerlin model object, with our new hormone coefficient, we can simply store our new model object, and run predictms:

. estimates store m1

. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   //  time variable

predictms doesn’t care if your model converged or not, it simply uses the values it finds in e(b). Even if I hadn’t fitted a first model, I could’ve just passed a matrix of my own values to stmerlin, the only thing to make sure you have in memory is some form of survival data, as it will check your data is stset, and a variable called hormon. Those things can be easily created with some dummy data.

We can now generalise this to assess the assumption that our treatment effect comes from \(N(-1,0.1^{2})\), by making a large number of draws from the normal distribution, and calculating survival at 5 years for each run, as follows:

. local Nsim = 200

. capture set obs `Nsim'

. //to store our predictions
. gen s5 = .
(686 missing values generated)

. //loop over Nsim draws
. set seed 13490        // <- don't forget!

. forvalues i=1/`Nsim' {
  2.         local draw = rnormal(-1,0.1)
  3.         mat b1[1,1] = `draw'
  4.         qui stmerlin hormon, dist(weib) from(b1) iter(0)
  5.         //double check our hormon coefficient matches the random draw
.         assert [_cmp_1_1_1][_cons]==`draw'     
  6.         //store new model object
.         estimates store m1
  7.         //predict survival at 5 years
.         predictms , singleevent models(m1) at1(hormon 1)        ///
>                         probability timevar(tvar)
  8.         //store my new estimate
.         qui replace s5 = _prob_at1_1_1[1] if _n==`i'
  9. }

which gives us our distribution of estimates of S(5):

. summarize s5

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          s5 |        200    .7194541    .0239843   .6604151   .7868063

Hopefully that shows the basis for how to manipulate a fitted model in Stata. Any parameter can be altered, and any prediction can be obtained, I simply tricked stmerlin into turning it back into an estimates object, which means we can use a post-estimation function, such as predict or predictms.

Micro-simulation

Now predictms calculates analytic predictions for survival, competing risks and illness-death structures – and uses large-sample simulation to obtain predictions from all others. But, we can use simulation if we so wish! If we’re interested in conducting a micro-simulation and utilising how long each observation spends recurrence free (clearly relevant in a health economic context), well we can add the simulate option to invoke the use of simulation, and specify the save(filename) to save our simulated survival times. The default number of simulations is n(100000), which of course can be changed.

Let’s compare analytic prediction, as before,

. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   //  time variable

. list _prob* in 1

     +-----------------------+
     | _prob_a~1   _prob_a~2 |
     |-----------------------|
  1. | .74514895   .25485105 |
     +-----------------------+

and now using simulation,

. predictms ,     singleevent     /// standard survival setting
>                 models(m1)      /// fitted model
>                 at1(hormon 1)   /// covariate pattern
>                 probability     /// request probabilities
>                 timevar(tvar)   ///  time variable
>                 simulate        /// use simulation
>                 seed(3418736)   //  reproducibility

. list _prob* in 1

     +---------------------+
     | _prob_~1   _prob_~2 |
     |---------------------|
  1. |   .74367     .25633 |
     +---------------------+

Our estimates are pretty close as they should be. We should always increase n() until predictions stabilise. Now, if we want access to the individual simulated times, we add the save() option,

. predictms ,     singleevent             /// standard survival setting
>                 models(m1)              /// fitted model
>                 at1(hormon 1)           /// covariate pattern
>                 probability             /// request probabilities
>                 timevar(tvar)           ///  time variable
>                 simulate                /// use simulation
>                 seed(3418736)           /// reproducibility
>                 save(sims,replace)      //  save simulated times

. list _prob* in 1

     +---------------------+
     | _prob_~1   _prob_~2 |
     |---------------------|
  1. |   .74367     .25633 |
     +---------------------+

Let’s take a look:

. use sims, clear

. list _state* _time* in 1/5

     +----------------------------------------+
     | _state1   _state2   _time1      _time2 |
     |----------------------------------------|
  1. |       1         1        0           5 |
  2. |       1         2        0   .92867371 |
  3. |       1         1        0           5 |
  4. |       1         1        0           5 |
  5. |       1         1        0           5 |
     +----------------------------------------+

We get the starting and finishing state, and the enter and event/censoring times. To calculate our survival probability of interest in this case, we can simply count how many observations are still in the initial state, and divide by the total.

. count if _state2==1
  74,367

. di "S(5) = " `r(N)'/_N
S(5) = .74367

If we added confidence intervals with the ci option and use the bootstrap option, we would save each set of simulated survival times generated to calculate our confidence intervals using parametric bootstrap…which uses the same steps as above!

All of the above extends to a more general multi-state setting – simply pass more models to predictms, specify your transition matrix, and you are good to go.

Specialist subjects

Methods Development

Methods Development We provide expert guidance in finding the appropriate statistical approach to answer your question… and if there isn’t yet a method, well, we can develop one. While applying biostatistics to address your research question is essential, there may be times when existing methods fall short for your specific problem. In such cases, we’re […]

Specialist subjects

Applied Biostatistics

Applied Biostatistics Biostatistics plays a crucial role in advancing medical research. Whether it’s clinical trials, epidemiological studies, or pre-clinical research, biostatistics is essential for drawing meaningful, impactful conclusions from complex data. Our team consists of internationally recognized experts in applied biostatistics, with deep experience in a wide range of areas such as survival analysis, multi-state […]

Specialist subjects

Haematology

Haematological malignancies At Red Door Analytics, we have extensive experience in working with haematological malignancies, demonstrated through 18 publications in peer-reviewed journals. Our expertise spans epidemiological studies on prognosis and late effects, as well as randomised clinical trials. Based in Stockholm, we have unique experience in accessing and working with registry data from the Nordic […]

Videos

Introduction to Epidemiological Study Designs

This video offers a comprehensive introduction to epidemiological study designs, emphasising their classification, key definitions, strengths, limitations, and practical applications. We will dive into the most commonly used study designs, exploring their structure, purpose, and the contexts in which they are most effective. Throughout the video, real-world case studies of landmark research will be used […]

Specialist subjects

Real-World Evidence (RWE)

Real-World Evidence Real-world evidence (RWE) refers to data and information that, unlike data generated in clinical trials conducted in controlled environments, has been obtained from everyday clinical practice, patient registers, or other sources outside the clinical trial setting. RWE plays a crucial role in complementing traditional clinical trial data, providing insights into the safety, effectiveness, […]

Videos

State-of-the-art statistical models for modern HTA

At @RedDoorAnalytics, we develop methodology and software for efficient modelling of biomarkers, measured repeatedly over time, jointly with survival outcomes, which are being increasingly used in cancer settings. We have also developed methods and software for general non-Markov multi-state survival analysis, allowing for the development of more plausible natural history models, where patient history can […]

Videos

Multilevel (hierarchical) survival models: Estimation, prediction, interpretation

Hierarchical time-to-event data is common across various research domains. In the medical field, for instance, patients are often nested within hospitals and regions, while in education, students are nested within schools. In these settings, the outcome is typically measured at the individual level, with covariates recorded at any level of the hierarchy. This hierarchical structure […]

Statistical Primers

What are competing risks?

Competing risks In survival analysis, competing risks refer to the situation when an individual is at risk of experiencing an event that precludes the event under study to occur. Competing risks commonly occur in studies of cause-specific mortality, as all other causes of death than the one under study might happen before the individuals “have […]

Statistical Primers

What is immortal time bias?

Immortal time bias Immortal time bias is a type of bias that can occur in observational research when the study design allows for a period of time during which the outcome of interest cannot occur, often referred to as “immortal time”. Simply put, immortal time bias occurs when information from a future event is incorporated into the […]

Statistical Primers

What is the proportional hazards assumption?

Proportional hazards Proportional hazards in survival analysis means that the rate at which an event of interest occurs over time for two or more groups or individuals is proportional over time. Specifically, it assumes that the hazard ratio, which represents the relative rate of an event occurring between two groups or individuals, is constant over […]

Statistical Primers

What is censoring?

Censoring refers to a situation in survival analysis where the event of interest is not observed for some of the individuals under study. In this Statistical Primer, we’ll define three types of censoring often seen in survival analysis studies. Censoring occurs when the information on the survival time is incomplete or only partially observed. Censoring […]

Statistical Primers

What is the Cox model?

The Cox model The Cox model, also known as the proportional hazards model, is a popular statistical tool used to analyse survival data. It was developed by British statistician Sir David Cox, and published in 1972. It has gained popularity largely by avoiding making parametric assumptions about the shape of the baseline rate in a […]