Bayesian methods: a potential path forward for sepsis trials

Background Given the success of recent platform trials for COVID-19, Bayesian statistical methods have become an option for complex, heterogenous syndromes like sepsis. However, study design will require careful consideration of how statistical power varies using Bayesian methods across different choices for how historical data are incorporated through a prior distribution and how the analysis is ultimately conducted. Our objective with the current analysis is to assess how different uses of historical data through a prior distribution, and type of analysis influence results of a proposed trial that will be analyzed using Bayesian statistical methods. Methods We conducted a simulation study incorporating historical data from a published multicenter, randomized clinical trial in the US and Canada of polymyxin B hemadsorption for treatment of endotoxemic septic shock. Historical data come from a 179-patient subgroup of the previous trial of adult critically ill patients with septic shock, multiple organ failure and an endotoxin activity of 0.60–0.89. The trial intervention consisted of two polymyxin B hemoadsorption treatments (2 h each) completed within 24 h of enrollment. Results In our simulations for a new trial of 150 patients, a range of hypothetical results were observed. Across a range of baseline risks and treatment effects and four ways of including historical data, we demonstrate an increase in power with the use of clinically defensible incorporation of historical data. In one possible trial result, for example, with an observed reduction in risk of mortality from 44 to 37%, the probability of benefit is 96% with a fixed weight of 75% on prior data and 90% with a commensurate (adaptive-weighting) prior; the same data give an 80% probability of benefit if historical data are ignored. Conclusions Using Bayesian methods and a biologically justifiable use of historical data in a prior distribution yields a study design with higher power than a conventional design that ignores relevant historical data. Bayesian methods may be a viable option for trials in critical care medicine where beneficial treatments have been elusive. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13054-023-04717-x.


Quick instructions
• Synonyms for effect modification include subgroup effect, interaction, and moderation

•
The instrument applies to a single proposed effect modification at a time; complete one form per each outcome, timepoint, effect measure, and effect modifier Comment: A laboratory study by Romaschin et al. 1 is cited for justification of the analysis in the subgroup with EAA < 0.9.Romaschin writes "the results presented in this this study suggest that the adsorption capacity of PMX-20R is sufficient to remove a clinically significant amount of endotoxin in a majority of endotoxemic septic shock patients; however, this may not be the case in patients with a high EAA burden >0.9."Furthermore, results from a recent registry study found that patients with EAA between 0.6-0.9 and treated with PMX had similar outcomes to those in the EUPHRATES treatable cohort.Comment: The initial EUPHRATES trial paper did not carry out a test of interaction with the specific subgroup of interest.The Klein paper presents results only on that subgroup, so no test of interaction is available.However, we have reanalyzed the EUPHRATES data (see 1b below) and found strong evidence for interaction especially for the specific prior being used-MODS >9, EAA between 0.6-0.9,US sites.The Bayesian analysis does not produce a p-value, but when defining the EUPHRATES subgroup to match the Tigris study entry criteria, there was a greater than 99% probability (99.3% in unadjusted analysis and 99.6% in APACHE-II-adjusted analysis) that the treatment effect was larger in this treatable US-based cohort than in the remaining US patients.These posterior probabilities correspond approximately to one-side p-values of 0.007 and 0.004, so we have picked the ICEMAN response that includes both, even if they are doubled to get two-sided p-values.Comment: The Romaschin study 1 defines the upper limit of treatability with standard PMX treatment.In the EUPHRATES RCT, a protocol amendment after the second interim analysis (after advice from the DSMC and the FDA) restricted further enrollment to those with MODS of > 9; in total 80% of the 162 deaths occurred in the 43% of participants with MODS > 9 (where mortality was ~ 45%).In the smaller group with MODS < 9, there were 32 deaths in 154 participants (21% mortality).That MODS cut-point was based on the overall risk-benefit assessment, not a differential effect of PMX.
6 Optional: Are there any additional considerations that may increase or decrease credibility?(manual section 2.6) [ ] Yes, probably decrease [ X] Yes, probably increase Comment: A prior study by Marshall et al. 3 found that endotoxin activity below 0.6 units by EAA identified patients at lower risk of death and therefor identifies a population unlikely to benefit from endotoxin removal.Conversely extracorporeal blood purification devices have an upper limit with respect to removal capacity before reaching saturation.Result from Romaschin et al. 1 indicate that EAA >0.9 corresponds to the likely upper limit of treatabilty with a standard course of PMX hemadsorption.Finally, observational studies have found reduced benefit for PMX for patients with lower organ failure scores. 4

7: How would you rate the overall credibility of the proposed effect modification?
The overall rating should be driven by the items that decrease credibility.The following provides a sensible strategy:

b. Interaction of subgroup and PMX in EUPHRATES
The subgroup is defined by MODS > 9 AND 0.6 ≤ EAA ≤ 0.89.The reference group is everyone not in the subgroup.
The counts of observations cross-classified by treatment, subgroup and country are shown below.We check for an interaction of treatment and subgroup in all EUPHRATES patients and, because Tigris is being run only in the USA, in the EUPHRATES patients at sites in the USA.
Canada  In this model, b3 is the difference in log odds ratios between the subgroup and the reference group, and exp(b3) is the interaction effect, the ratio of the odds ratio in the subgroup (moderate EAA and MODS > 9) to the odds ratio in the reference group (EAA < 0.

b. Statistical model for Tigris
The binary outcome of 28-day mortality will be compared between groups using logistic regression.7][8] However, a more conservative unadjusted analysis may still be preferable assuming that randomization is effective.
The result of the analysis that updates the prior distribution of the OR with new data from Tigris is the posterior distribution of the OR; this pools the evidence from both the prior and the trial.When Tigris is complete, we will use the posterior distribution for the OR for PMX to compute the probability that PMX is effective, P(OR < 1).With those results in hand, there will be no reason to use a fixed threshold for what is considered strong evidence versus less strong evidence (for example, interpreting a 94.9% probability of benefit as being not strong evidence and a 95.1% probability of benefit as strong evidence).But when designing a Bayesian study, it can be useful to calculate the chance that a trial will find probabilities of benefit larger than key thresholds (e.g., > 97.5%, >95%) under various scenarios defined by the risk of mortality and the true effect of PMX in Tigris.These evaluations allow us to decide if the planned sample size is adequate and to identify the prior with the most desirable properties.
To help decide between a few combinations of analyses and priors in the design for Tigris, we ran a set of simulated trials.Given the benefits of PMX seen in EUPHRATES, for ethical reasons and to encourage enrollment, the randomization ratio will be 2:1 in favour of PMX.We chose a sample size of 150 subjects based on simulations performed using an effect size of 10% -15% and a down-weighting of the prior to 75% with a 95% threshold probability for declaring PMX superior to control and with this sample size, investigate other prior weights and analytic approaches.A larger sample size would be required if the prior is down-weighted more or if the effect size is less than the estimate.Trials were simulated using logistic regression with a baseline APACHE II score as a covariate.
where  is the risk of mortality for a patient with a known baseline APACHE II score and treatment group (TMT=1 for PMX and 0 for control).Simulations varied the control mortality risk by changing  and the true treatment effect by varying  " , the log OR for treatment with PMX.The value of  !(the log OR for a 1-point increase in APACHE II) was fixed at its value in the EUPHRATES treatable cohort.
Combinations of true mortality risk and treatment effect: we simulated 2,000 trials at each of the 25 combinations of a. the true control group 28-day risk of mortality in Tigris (40% to 60% by 5%) b. the true marginal absolute risk reductions (ARRs) of (0% to 20% by 5%) Options for priors and analysis: every trial was analyzed with each combination of a. five approaches to incorporating prior information from historical data from the treatable cohort: use of fixed weights of 100%, 75%, and 0% on the prior, use of a normalized power prior, and use of a commensurate prior b. two analytic models: an unadjusted analysis and an analysis adjusted by APACHE-II c. two threshold probabilities for declaring PMX superior to control: 97.5% and 95% In each analysis, we tested whether PMX was superior to control at each threshold probability and, for each combination, analysis, and prior weight, we calculated the percentage of the 2,000 trials in which we concluded PMX was superior to control.After viewing these results, the choice of the prior for the primary analysis of Tigris was based partly on judgment about the similarity of the Tigris trial patients and the treatable cohort from EUPHRATES and partly on the probability of concluding benefit.In classical analysis, this probability is called power when ARR > 0% and it is called type I error when ARR=0%.
Finally, to illustrate how the Bayesian analyses of Tigris will proceed with the chosen prior and threshold probability once we have a single trial result, we present results for hypothetical data exemplifying three groups of scenarios: (1) an observed treatment effect in Tigris of a similar magnitude to that observed in treatable cohort from EUPHRATES (7%-11% absolute risk reduction); (2) observed treatment effects in Tigris suggesting either no benefit or only a small amount of benefit (absolute risk reductions of 0-5%); and (3) observed treatment effects that are greater in magnitude to that seen in the treatable cohort from EUPHRATES (absolute risk reductions of >15%).

Estimate 3 key parameters from the treatable cohort:
a.With 28-day mortality as the outcome, fit a frequentist logistic regression model to estimate these two parameters • The log-odds ratio for a one-unit change of APACHE score.This value is used in the simulation of trial data.

•
The APACHE-adjusted log-odds ratio comparing the PMX-treated and control groups.The maximum likelihood estimate and its estimated standard error are used to construct prior distributions for APACHE-adjusted analysis of the simulated trials (in item number 2 below).b.With 28-day mortality as the outcome, fit an unadjusted frequentist logistic regression model in the treated cohort to estimate the unadjusted log-odds ratio comparing the PMX-treated and control groups.The maximum likelihood estimate and its estimated standard error are used to construct prior distributions for unadjusted analysis of the simulated trials (in item number 2 below).Models for both unadjusted and adjusted analyses of the treatable cohort were fitted in R using the glm function.

Create prior distributions for the log-odds ratios for treatment:
Use the results of the modelling in steps 1a and 1b to create a "base" prior for the APACHE-adjusted and unadjusted log-odds ratios for treatment.The priors were approximated by normal distributions, as the likelihoods from step 1 were close to normal.i. log(ORadjusted)~ N(mean=-0.605,SD=0.337) ii.log(ORunadjusted)~ N(mean=-0.435,SD=0.314) Results from two classes of priors formed from these base priors are presented in the simulation study.The normalized power prior can be seen as putting a prior on the weight W, but as we found the results from this prior were very similar to results from the commensurate prior, we provide no further details here.
Values of  and   .The steps below were used to solve for the values of  and  " that give the required true control group risk p and absolute risk reduction d.
a.In the control group, the mortality risk p=p(Y=1|x,TMT=0) for a control group patient with a given value of x is related to x by the equation below.I. Unadjusted analysis with uninformative prior for the unadjusted log-odds ratio of treatment II.
Unadjusted analysis with a fixed 75% weight (W=0.75) on the base prior for the unadjusted log-odds ratio of treatment III.
Unadjusted analysis with a commensurate prior for the unadjusted log-odds ratio of treatment IV.
Unadjusted analysis with a fixed 100% weight (W=1.00) on the base prior for the unadjusted log-odds ratio of treatment V. APACHE-adjusted analysis with uninformative prior for the adjusted log-odds ratio of treatment VI.APACHE-adjusted analysis with a fixed 75% weight (W=0.75) on the base prior for the adjusted log-odds ratio of treatment VII.APACHE-adjusted analysis with a commensurate prior for the adjusted log-odds ratio of treatment VIII.APACHE-adjusted analysis with a fixed 100% weight (W=1.00) on the base prior for the adjusted log-odds ratio of treatment In each of these models, the parameters  and  ! were given uninformative priors and an additional pre-EUPHRATES N(0, SD=1.175) prior was included for the treatment effect  " that put 95% probability on values in the range 0.1 to 10 for the odds ratio for treatment.Simulate 2,000 trials (using ( 4)) Analyze each trial with 8 different models (using ( 5)) and save the results For each model, Calculate the proportion of the 2,000 trials satisfying the criteria I95 and I97.5

} }
The proportions satisfying criterion I95 and I97.5 are used to estimate the probability of trial success for the corresponding combination of p, d and the analytic method.
All simulations were run in R, using the rstan and brms packages.Four parallel chains were run for each model fit, with 1,000 warm-up iterations and a further 5,000 iterations on which MCMC samples were saved.Convergence to a stationary distribution with this number of iterations was checked on a subset of the simulated datasets, and as expected, given the simplicity of these logistic regression models with sufficient events, models converged quickly.The subject must have a screening multi-organ dysfunction score (MODS) >9 OR a sequential organ failure assessment (SOFA) >11, in the event a complete MODS cannot be obtained due to missing measurements** Endotoxin Activity Assay between ≥ 0.60 to <0.90 EA units Evidence of at least 1 of the following criteria for new onset organ dysfunction that is considered to be due to the acute illness:

Supplemental figures
g. Requirement for positive pressure ventilation via an endotracheal tube or tracheostomy tube h.Thrombocytopenia defined as acute onset of platelet count <150,000µ/L or a reduction of 50% from prior known levels i. Acute oliguria defined as urine output <0.5mL/kg/hr for at least 6 hours despite adequate fluid resuscitation * When determining the eligible dose of vasopressors for a subject whose measured body weight is >100 kg, the maximum weight of 100 kg (220 lbs) will be used.This maximum weight applies to both males and females.** Subjects with MODS ≤ 9 who have a complete MODS will be excluded from the trial even if they have a SOFA >11.

Study Procedures:
This is a prospective, multicenter, randomized, open-label trial of standard medical care plus the PMX cartridge versus standard medical care alone, in subjects with endotoxemia and septic shock.Subjects in critical care areas will be assessed for septic shock using known or suspected infection, multiple organ failure, fluid resuscitation and hypotension requiring vasopressor support as primary criteria.Subjects will meet all entry criteria for study if endotoxin activity is within the range of ≥ 0.60 to <0.90.
Eligible and consented subjects will be randomized to receive either the PMX cartridge (administered twice for 1½ to 2 hours per treatment session approximately 24 hours apart) plus standard medical care or standard medical care alone.For all subjects in whom treatment has been initiated, a follow-up visit (if they are still in the hospital) or a telephone call will be completed at Day 28 (or later) to determine their mortality status.In surviving subjects, a follow-up visit or telephone call to determine their mortality status will also take place at approximately three months (i.e.Day 90) and 12 months after the subject was randomized.

Study Duration:
The duration of treatment and active follow-up for each subject will be from the time of treatment until 12 months post-treatment.Study enrollment is expected to be complete in 2023.

Safety Assessments:
• Incidence of adverse events (AE, SAE, UADE) from the initiation of treatment up to the end of Day 4 • Incidence of treatment related adverse events (defined as possibly, probably or definitely related to the PMX cartridge, venous access for the purpose of the study or heparin use for the purpose of the study) from initiation of treatment until study completion • Changes in blood chemistry, hematology and coagulation parameters

3 .
SD ./00= (), where  ~ N(mean=0,SD=1) Find the values of ,   , and   in the logistic regression model below that give a true control group mortality risk of p and a true absolute risk reduction between PMX and control of d:logit((|)) =  +  !×  +  " × where Y = binary mortality outcome, x = APACHE II, and TMT = binary treatment indicator Distribution of x: The values of x are assumed to come from a normal distribution with the observed mean APACHE (m) and standard deviation (s) from the treatable cohort.That is x ~ N(x| m, SD) , where N() here is used to denote the normal density function.

4 . 5 .
(− −  !) b.The marginal risk p in a control group with x distributed according to the density (x|m,s) is given by integrating p(Y=1|x,TMT=0) over the density of x:  = ( = 1| = 0) = ∫(1 /(1 + exp(− −  !))) (, |, ) A grid search was used to find the value of  satisfying the equation above for known p,  !, m, and s c.The marginal risk of mortality in the PMX group, p-d, is given by integrating p(Y=1|x, TMT=1) over the distribution of x:  −  = ( = 1| = 1) = ∫(1 /(1 + exp(− −  ! −  " )))(, |, ) With  from (3b) and  !from (1a), a grid search was used to find the value of  " giving PMX true group mortality risk of p-d for known p,d,  !, m, and s .This gives the fully specified logistic regression model for simulating Y corresponding to a baseline risk p and ARR d. logit((|, )) =  +  !×  +  " ×  Simulate a single data set: For a given p and d, which determine  and  " , simulate a data set with 150 observations.a. Generate 150 independent random values APACHEi from N(m, s), i=1,150 b.Assign the first 100 to PMX (TMTi=1) and the next 50 to control (TMTi=0) c.For each of the 150 (APACHEi, TMTi) pairs, use the equation below to generate a mortality risk  3 = 1/(1 + exp(− −  ! 3 −  454  3 )) d.Simulate 150 random uniform variables Ui and set Yi = 1 if Ui< pi and 0 otherwise.e.The dataset comprises the 150 sets of the triplets (Yi, APACHEi, TMTi) Analyze the simulated dataset: Analyze the dataset created in (4) with 8 different Bayesian models (ignoring the power prior):

For each of the 8 6 .
models, save the resulting model fits and calculate, from the MCMC samples a) Posterior means and 95% credible intervals for the log-odds ratios for treatment b) The proportion of values of the posterior samples of log-odds ratios for treatment that are below zero (a value that estimates the posterior probability of benefit.)c) 4 binary variables indicating whether the proportion in (b) is larger than 0.95 and 0.975 a. I95 = I[P(logOR|datasim) > 0.95] b.I97.5 = I[P(logOR|datasim) > 0.975] Simulate multiple trials at combinations of control group risk and marginal risk difference and save the results For true control group 28-day risk of mortality p in (40% to 60% by 5%){ For true absolute risk reduction with PMX d in (0% to 20% by 5%){ Find the values of  and  " corresponding to p and d (using (3))

Figure S1 :
Figure S1: Power (probability of demonstrating benefit at the 97.5% probability threshold) versus treatment benefit (expressed as the true absolute risk reduction) with APACHE-adjusted and unadjusted analyses for four different uses of the historical data and control group risk of mortality (BR) of 40% to 60%.

Figure S2 :
Figure S2: Unadjusted absolute risk reductions in 2,000 simulated trials with a baseline risk of 50%.Blue labels indicate the percentages of simulated trials where we conclude benefit (i.e., the power) for the corresponding absolute risk reduction and use of historical data.

Figure S3 :
Figure S3: Unadjusted odds ratios in 2,000 simulated trials with a baseline risk of 50%.Blue labels indicate the percentages of simulated trials where we conclude benefit (i.e., the power) for the corresponding absolute risk reduction and use of historical data.

28-day mortality
State a single outcome and, if applicable, time-point of interest (e.g., mortality at 1 year follow-up): State a single effect measure of interest (e.g., relative or absolute risk difference): Odds Ratio State a single potential effect modifier of interest (e.g., age or comorbidity): Endotoxin activity (EAA) Was the potential effect modifier measured before or at randomization? [ X ] yes, continue [ ] no, stop here, refer to manual for further instructions Credibility assessment 1: Was the direction of the effect modification correctly hypothesized a priori?[ ] Definitely no [ ] Probably no or unclear [ ] Probably yes [ X ] Definitely yes

: If the effect modifier is a continuous variable, were arbitrary cut points avoided?
Evaluating the Use of Polymyxin B Hemoperfusion in a Randomized Controlled Trial of Adults Treated for Endotoxemia and Septic Shock" was a multicenter, randomized, clinical trial. 5Study procedures were performed in accordance with the ethical standards of the responsible committee on human experimentation (institutional or regional) and with the Helsinki Declaration of 1975.Prior to randomization, informed consent was obtained from all subjects or their surrogates based on meeting all of the eligibility criteria.The first institutional IRB approval was obtained on 05/18/2010 from Cooper University Hospital IRB (#09-144).The study is registered with clinicaltrials.govas [NCT01046669].