Quality of the evaluations of clinical studies in adult critical care and emergency medicine on a specialized Internet site
© BioMed Central Ltd. 2010
Published: 1 March 2010
Given the huge publication flow, tools helping intensivists to select articles worth reading and to appraise relevant information are welcome. Evaluations of selected articles on dedicated Internet sites could serve that purpose.
As a first attempt to estimate the quality of such sites, we conducted a prospective observational study of http://www.f1000medicine.org (F1000). Consecutive adult clinical studies with a first F1000 evaluation in the critical care and emergency medicine specialty were included. We recorded predefined items describing the internal validity, clinical relevance and external coherence of the studies and corresponding information in their evaluations. The primary endpoint was conformity of the conclusions to raw study results, compared between each study and its first F1000 evaluation.
Fifty studies with 56 evaluations (six articles evaluated twice) were included. Conclusions conform to factual study results in 52% of the studies, excessive in 30%, out of scope in 18%; conform in 36% of all evaluations, excessive in 25%, out of scope in 39%. The conformity level significantly differed between the studies and their first evaluation (P = 0.026, Mantel-Haenszel chi-squared test). No association between conformity level and journal impact factor was observed in included studies (P = 0.84, Kruskal-Wallis test), neither in their evaluations between conformity level and publication-to-evaluation time (P = 0.46), evaluation length (P = 0.35) nor strictly article-focused relative length (P = 0.027, Bonferroni correction: NS). Regarding internal validity, out of 28 prospective studies (13 interventional), five had been registered, three of which before study termination with publicly traceable protocol changes; registration was discussed in four out of 33 related evaluations. Alpha risk inflation was uncontrolled in 95% out of 41 nonpurely descriptive studies, but never discussed in their 46 evaluations. Out of 20 negative studies, 16 were underpowered (an unrecognized fact in 15 studies), but power was discussed in only five out of 25 related evaluations. Clinical relevance was discussed in only 13% of F1000 evaluations. Regarding external coherence, 52% of F1000 evaluations did not relate study results to existing literature.
Our results suggest that F1000 evaluations often provide information poorly suited to critical appraisal of clinical studies by intensivists. Basing evaluations on assessment grids could improve their overall quality.