Small studies may overestimate the effect sizes in critical care meta-analyses: a meta-epidemiological study

Introduction Small-study effects refer to the fact that trials with limited sample sizes are more likely to report larger beneficial effects than large trials. However, this has never been investigated in critical care medicine. Thus, the present study aimed to examine the presence and extent of small-study effects in critical care medicine. Methods Critical care meta-analyses involving randomized controlled trials and reported mortality as an outcome measure were considered eligible for the study. Component trials were classified as large (≥100 patients per arm) and small (<100 patients per arm) according to their sample sizes. Ratio of odds ratio (ROR) was calculated for each meta-analysis and then RORs were combined using a meta-analytic approach. ROR<1 indicated larger beneficial effect in small trials. Small and large trials were compared in methodological qualities including sequence generating, blinding, allocation concealment, intention to treat and sample size calculation. Results A total of 27 critical care meta-analyses involving 317 trials were included. Of them, five meta-analyses showed statistically significant RORs <1, and other meta-analyses did not reach a statistical significance. Overall, the pooled ROR was 0.60 (95% CI: 0.53 to 0.68); the heterogeneity was moderate with an I2 of 50.3% (chi-squared = 52.30; P = 0.002). Large trials showed significantly better reporting quality than small trials in terms of sequence generating, allocation concealment, blinding, intention to treat, sample size calculation and incomplete follow-up data. Conclusions Small trials are more likely to report larger beneficial effects than large trials in critical care medicine, which could be partly explained by the lower methodological quality in small trials. Caution should be practiced in the interpretation of meta-analyses involving small trials.


Introduction
Small-study effects refer to the pattern that small studies are more likely to report beneficial effect in the intervention arm, which was first described by Sterne et al. [1]. This effect can be explained, at least partly, by the combination of lower methodological quality of small studies and publication bias [2,3]. Typically, such small-study effects can be evaluated by funnel plot. Funnel plot depicts the effect size against the precision of the effect size. Small studies with effect sizes of wider standard deviations should widely and symmetrically distribute at the bottom of the plot, and large studies should cluster at top of the plot, making it the shape of an inverted funnel plot. If a funnel plot appears asymmetrical, publication bias is assumed to be present.
In critical care medicine, studies are conducted in intensive care units (ICU) where the number of beds is limited. Due to the nature of population and the care setting, the studies in critical care frequently have a small sample size. Meta-analysis is considered to be an important tool to combine the effect sizes of small trials, allowing more statistical power to detect the beneficial effects of a new intervention. However, according to meta-epidemiological studies conducted in other biomedical fields, interpretation of meta-analyses of small trials should be cautious, and such meta-analyses may overestimate the true effect of an intervention [3,4]. Small-study effect has been observed when examining meta-analysis with binary [3] and continuous outcomes [4]. In critical care medicine, small-study effects have never been quantitatively assessed. Thus, we conducted this systematic review of critical care meta-analyses in an attempt to examine the presence and extent of small-study effects in critical care medicine.

Search strategy and study selection
Medline and Embase databases were searched from inception to August 2012. There was no language restriction. The core search terms consisted of critical care, mortality and meta-analysis (detailed search strategy is shown in Additional file 1). Inclusion criteria were as follows: critical care meta-analyses involving randomized controlled trial; the end points should include mortality; at least one component trial had more than 100 subjects per arm on average. Exclusion criteria were systematic reviews without meta-analysis; all component trials were exclusively large (sample sizes ≥100 per arm) or small trials (sample sizes <100 per arm); meta-analyses included duplicated component trials. If there were several meta-analyses addressing the same clinical issue, we included the most updated one. Two reviewers (XX and ZZ) independently assessed the literature and disagreement was settled by a third opinion (HN).

Data extraction
The following data were extracted from eligible metaanalyses: the lead author of the study, year of publication, number of trials, treatment strategy in the experimental arm, proportion of large trials in each meta-analysis, effect size and corresponding 95% confidence interval (CI), heterogeneity as represented by I 2 . For each component trial, we extracted the following data: sequence generating, allocation concealment, blinding, incomplete follow-up data, intention-to-treat analysis, sample size calculation, and year of publication. Sequence generating was considered adequate when the trial reported the method to generate the randomization sequence (for example computer, randomization table). Allocation concealment was considered adequate when the investigator responsible for patient selection was unable to predict allocation of the next patient. The commonly used techniques included the use of central randomization or sequentially numbered, opaque and sealed envelopes. Blinding was considered adequate if the experimental and control interventions were described as indistinguishable by patients or investigators [5].
Small and large trials were distinguished by a cutoff of an average of 100 subjects per arm. For example, if a two-arm trial had 113 patients in one arm and 87 patients in the other, it was considered a large trial. This definition was somewhat arbitrary. However, a sample size of 200 patients allowed an 80% statistical power to detect an absolute difference of 20% for binary outcomes (assuming that the proportion in the control group was 50%) at two-sided α = 0.05. Another reason for this definition was that critical care trials were usually small, and a greater cutoff point would significantly reduce the number of meta-analyses that were eligible for the analysis.

Statistical analysis
Treatment effects were expressed as odds ratio (OR) for mortality. The number of events and total number of patients in each arm were extracted for each component trial. An OR <1 indicated beneficial effect in the experimental arm. A standard logistic regression model was used to examine whether estimated treatment effects differ according to whether a trial is large or small [6,7]. Ratio of OR (ROR) was estimated from the regression model. ROR <1 indicates larger effect size in smaller studies and ROR >1 indicates larger effect size in large trials. ROR was calculated separately for each meta-analysis. These RORs were then combined using a meta-analytic approach. Inverse variance weighting and either fixed effect or random effects models were used to pool these RORs. Meta-analyses involving exclusively large or small trials were not included in our analysis and thus did not contribute to the analysis. Heterogeneity between trials was estimated using I 2 . A rough guide to the interpretation of I 2 can be as follows: 0 to 40% represents unimportant heterogeneity; 30% to 60% represents moderate heterogeneity; 50% to 90% represents substantial heterogeneity and 75% to 100% represents considerable heterogeneity [8]. To account for the difference of estimated effects between large and small trials, the qualities of study reporting including sequence generating, blinding, allocation concealment, incomplete follow-up data, sample size calculation and intention to treat were compared between large and small trials. The proportions of large and small trials were compared based on the year of publication before and after 2002. This was defined arbitrarily. However, we feel that multicenter large trials have increased rapidly in the last 10 years. We analyzed the association between sample size and treatment effects, stratified according to the significance of effect size and heterogeneity within each meta-analysis. All statistical analyses were performed using Stata software version 11.0 (StataCorp LP, College Station, TX, USA). Statistical significance was considered at two-sided P <0.05.

Study selection and characteristics
Our initial search identified 371 citations. Of them, 329 were excluded by reviewing the title and abstract because they were duplicate meta-analyses, included non-randomized trials, did not report data on mortality, and other irrelevant articles. Full text of the remaining 42 citations was reviewed, of which 15 citations were excluded. In these excluded 15 citations, eight did not include large trials, study end point was not mortality in four meta-analyses, and three were duplicated metaanalyses. A total of 27 meta-analyses  involving randomized controlled trials were finally included in our analysis ( Figure 1).
To explore possible explanations for the difference of effect sizes between large and small trials, we compared reporting qualities of large and small trials ( Table 2). As expected, the large trials showed significantly better reporting quality than small trials. More large trials were well conducted than small trials in terms of sequence generating, allocation concealment, blinding, intention to  treat, sample size calculation and incomplete follow-up data. For instance, 82% of large trials explicitly reported allocation concealment, while only 51% of small trials reported this (P <0.001). Intention-to-treat analysis was used in 83% of the large trials, while only 52% of the small trials used this analysis (P <0.001). Sample size calculation was performed a priori in 88% of large trials and only 44% of small trials reported sample size calculation (P <0.001). Some 75% of large trials were published after the year 2002, while 52% of small trials were published after 2002 (P <0.001).

Discussion
This is the first meta-epidemiological study conducted in the field of critical care medicine to demonstrate smaller trials may overestimate treatment effect size. In this study, we included 27 meta-analyses of 317 randomized controlled trials covering all subspecialties of critical care medicine. The results showed that small trials were more likely to report larger estimated treatment effects compared with large trials, and this was more prominent in Table 2 Comparison of qualities between small and large trials in meta-analyses in critical care medicine.  meta-analyses involving highly heterogeneous component trials. Furthermore, the small trials were of low quality in methodology compared with large trials, which may partly account for the small-study effects.
In a meta-epidemiological study in osteoarthritis [4], the authors employed the difference in effect sizes between large and small trials to explore the small-study effects. In line with the present study, they concluded that small trials were more likely to report larger beneficial treatment effects than large trials. However, in that study, the small trials were not statistically different from those of large trials in terms of blinding, intention-to-treat analysis, thus, the small-study effects cannot be fully explained by methodological quality. In osteoarthritis trials blinding is probably much more easily achieved because drugs can be made indistinguishable in appearance. Thus, small studies, usually with limited financial support, can also achieve good methodological quality. In contrast, in critical care medicine, blinding is sometimes impossible or complex due to the nature of intervention. Such interventions included pulmonary artery catheter, intensity of continuous renal replacement therapy, prone ventilation, and subglottic secretion drainage. In these situations, blinding may be difficult to achieve or only large trials with more planning and methodological support can make this possible. Thus, small trials in critical care medicine are of limited quality in design compared with large trials.
Possible explanations for the small-study effects have been explored. Kjaergard and colleagues [3] demonstrated that small studies with lower quality significantly exaggerated the intervention effect compared to large trials, while small trials with adequate sequence generating, allocation concealment and blinding did not differ from large trials. This is consistent with our findings that small studies had lower methodological quality compared with large trials. However, the impact of methodological quality such as allocation concealment and blinding varies according to different outcomes. A meta-epidemiological study involving 146 meta-analyses demonstrated that in trials with subjective outcomes the estimated effect sizes were exaggerated when there was inadequate concealment or blinding, while in trials with objective outcomes such as mortality, there was little evidence that inadequate concealment and lack of blinding would distort the estimated effect sizes [37]. This is in contrast with our findings, because we used mortality as an end point, but the result indicated that lack of blinding and inadequate allocation concealment might contribute to the exaggerated effect sizes in small trials. However, this is an unsettled question and there are other studies supporting our finding [38,39]. Most probably, individual quality measures such as blinding and allocation concealment are not consistently associated with the effect sizes across study areas, and each medical area should be specifically investigated [40]. Our analysis focused on the field of critical care medicine and showed, for the first time, a possible association of methodological quality with effect sizes.
There are several limitations in the present study. First, our study aims to investigate the small-study effect in critical care medicine. However, critical care is an extremely heterogeneous subspecialty that involves all varieties of diseases. Such heterogeneous nature may potentially impair the quality of the meta-epidemiological study. Second, the heterogeneity cannot be fully accounted for in the present analysis. We tried to explore the sources of heterogeneity by incorporating factors related to the quality of study design in meta-regression model, but failed to identify a significant covariate. We propose that the explainable factors may be those that cannot be readily accessible. Studies with negative results are less likely to be published than studies with positive results, particularly if such studies are small in sample size. As a result, small studies with negative result are less likely to be accepted into publication. If this is the case, it is not surprising that small studies are more likely to report beneficial effect. However, such kind of publication bias cannot be systematically investigated. Another explanation for the small-study effect may be that more rigorous implementation of interventions is performed in smaller studies.

Conclusions
In conclusion, our study included 27 critical care metaanalyses involving all subspecialties in the field of critical care medicine. The result showed that small studies tended to report larger beneficial effects than large trials. Interpretation of meta-analyses of small trials should be cautious and sometimes definitive conclusions cannot be made until a large multicenter trial is conducted.

Key messages
• Interpretation of critical care meta-analyses involving small studies should be cautious due to the small-study effects.
• Small-study effects may be attributable to the poor methodological quality of the small studies.

Additional material
Additional file 1: Search strategy. The additional file shows the detailed search strategy performed in our study.
Abbreviations CI: confidence interval; ICU: intensive care unit; ROR: ratio of odds ratio.
Authors' contributions ZZ conceived the idea, collected data and drafted the manuscript; XX helped collect data and analyses; HN helped collect data, and analyze and