Study design
We conducted a systematic review of databases of registered clinical trials, followed by a clinician survey.
Systematic review
The EudraCT, ClinicalTrials.gov, and ANZCTR clinical trial registries were searched in September 2015 for critical care medicine trials in which landmark mortality was the primary outcome. Studies were excluded if they were not two-sided superiority trials, cluster, or cluster crossover trials; if they were focused on a pediatric population; if they were not related to critical care medicine; or if they were purely investigations of surgical techniques. Trials that had completed recruitment were also excluded. Trials registered on more than one database were included only once. Trial investigators were emailed to request data used to inform their sample size calculations. Any trials found to meet exclusion criteria as a result of the reply from their investigators (e.g., trials no longer recruiting) were excluded.
We recorded the following trial characteristics from online registries: sample size, eligibility criteria for trial participants, intervention details, comparison group details (e.g., placebo or usual-care strategy), trial origin country, and landmark for mortality outcome measurement (e.g., 28 days). We recorded the following from investigator replies: power used in sample size calculations, expected baseline mortality, and expected effect size (absolute mortality difference between control and intervention groups).
Survey
Each trial identified in our systematic review was presented according to the participants, intervention, comparison, outcome standard, and clinicians were asked two questions per trial. First, they were asked to estimate the percentage chance that the actual effect of the treatment being investigated in a particular trial would equal or exceed the effect postulated by investigators. Second, they were asked to specify the largest absolute mortality reduction that they considered to be plausibly attributable to each treatment being investigated. For example, for the ADjunctive coRticosteroid trEatment iN criticAlly ilL Patients with Septic Shock trial [8], which is a 3800-participant trial in which researchers are investigating the effect of a continuous 7-day intravenous infusion of 200 mg/day hydrocortisone on day 90 mortality among adults who are ventilated with septic shock and have received vasopressors/inotropes for at least 4 h, clinicians were asked the following questions:
-
1.
Assuming a baseline day 90 mortality rate of 33% (the baseline mortality rate used by investigators in their power calculations [8]), what do you think the chances are that a continuous 7-day intravenous infusion of 200 mg per day of hydrocortisone reduces absolute mortality by 5% or more? (Answers from 0–100% were allowed.)
-
2.
Assuming a baseline day 90 mortality rate of 33%, what is the largest absolute reduction in day 90 mortality that you believe could occur as a result of a continuous 7-day intravenous infusion of 200 mg per day of hydrocortisone? (Answers from 0% to 33% were allowed.)
Demographic data collected from survey respondents were region of residence (Australia and New Zealand [ANZ], United Kingdom [UK], Europe [outside UK], United States [USA], Canada, Central or South America, Asia, Africa) and training background (intensive care specialist, other specialist, training to be a specialist in intensive care medicine, training in an area of medicine other than intensive care). Intensive care specialists were asked how long they had been working as an intensive care specialist (<5 years, 5–10 years, >10 years).
The survey was piloted by 20 clinicians from ANZ, USA, Europe, and the UK who provided feedback on ease of use, interface, and the survey’s duration. The length of the survey was reduced following the pilot phase because feedback indicated that the original version was too long. Additional file 1 shows the final version of the survey, which was distributed with the weekly Critical Care Reviews newsletter over 4 consecutive weeks [9]. The newsletter had 6243 subscribers at the end of this 4-week period. No demographic data are collected from list subscribers; however, the list is free to subscribe to from anywhere in the world, and no restrictions are placed on registration. The email containing the survey was opened by between 2788 and 2889 people per week in each of the 4 weeks in which the survey was running. We chose to “crowd-source” responses from clinicians with an interest in critical care to provide us with “real-world” opinions.
Outcomes
The primary outcome was the clinicians’ perceptions of prior probability for each trial, which was defined as the percentage chance that each trial would demonstrate a mortality effect equal to or greater than that used in the sample size calculation for that trial.
The following were secondary outcomes:
-
1.
The calculated chance that a statistically significant result at the P = 0.05 level for each trial would represent a true-positive
-
2.
The largest effect size that surveyed clinicians considered plausible for each trial
-
3.
The sample size that each trial would require to detect the median largest effect size considered plausible by clinicians
Statistical analysis
Continuous variables are reported as median and IQR or mean ± SD, and categorical variables are reported as counts and percents. Clinician-perceived prior probability for each trial was used to derive an estimate that a statistically significant result at the P = 0.05 level would represent a true-positive using the method described in Fig. 1. Specifically, as outlined in Additional file 2, the chance of a true-positive was calculated as follows [10]:
$$ \mathrm{True}\hbox{-} \mathrm{positive}\ \mathrm{rate} = \left(\mathrm{prior}\ \mathrm{probability} \times \upbeta \right)/\left(\left(1\ \hbox{-}\ \mathrm{prior}\ \mathrm{probability}\right) \times \upalpha \right) + \left(\mathrm{prior}\ \mathrm{probability} \times \upbeta, \Big)\right) $$
The sample size that each trial would require to detect the median largest effect size considered plausible by clinicians was calculated using standard methods for trials designed to compare two binomial proportions. We used the same β for these calculations as investigators had used in their initial sample size calculations and assumed an α of 0.05. Analysis of variance was used to analyze differences in survey results by location and specialty. A Mann-Whitney U test was used to compare clinicians’ estimates of effect size, with treatment effect sizes used to inform sample size calculations. A P value of <0.05 was considered to indicate statistical significance. Statistical analysis was performed using Real Statistics Resource Pack release 3.8 software (London, UK).