- Open Access
Interpretation of gene associations with risk of acute respiratory distress syndrome: P values, Bayes factors, positive predictive values, and need for replication
Critical Care volume 20, Article number: 402 (2016)
Single nucleotide polymorphisms (SNPs) in certain genes play a role in the observed variability in development and severity of acute respiratory distress syndrome (ARDS). Identified SNPs can direct future studies aiming to target diagnostic, preventive, and therapeutic interventions for the complex pathophysiology of ARDS . For example, CFTR is involved in fluid absorption from alveoli and in negatively modulating the inflammatory response [2, 3], and Perez-Marques et al. report that SNPs in DNA for proteins involved in splicing in the exon 9 region of CFTR mRNA were independently associated with risk for ARDS . The same group have identified other statistically significant candidate gene associations with the risk for pediatric ARDS (Table 1) [2–7]. In adults, other candidate gene associations with risk for development of and outcome from ARDS have been suggested [1, 8]. How should these gene-association hypotheses be interpreted?
Theoretical considerations: when is a gene-association hypothesis supported?
The P value is the probability, assuming that the null hypothesis (i.e., no difference between groups) is in fact true and that all model assumptions (i.e., no selection, attrition, analysis, or reporting bias—only chance is operating) are satisfied, of obtaining a result equal to or more extreme than what was actually observed [9, 10]. The “P value fallacy” is “the illusion that conclusions can be produced with certain ‘error rates’ without consideration of information from outside the experiment” . The fallacy is to think that the P value refers to a hypothesis probability, involving inductive reasoning back from evidence (observations) to underlying truth [9–11]. This leads to misinterpretations of the P value (Table 2). To make the inductive inference about hypothesis probability requires Bayesian methods.
Bayesian methods are conceptually simple: (Prior-odds of null hypothesis)(Bayes factor) = (Posterior-odds of null hypothesis) . The prior-odds are based on evidence external to the study concerning the plausibility of the null hypothesis; in a field of study, this is the ratio of the number of “true relationships” to “no relationships” among those tested in the field . The Bayes factor (BF) measures the relative support, from the observed evidence, for two hypotheses: (Probability of the data given the null hypothesis)/(Probability of the data given the alternative hypothesis). The BF modifies the prior probability to give the posterior probability of the null hypothesis (or, if one reverses the numerator and denominator, the post-study probability that there is a true association: positive predictive value (PPV)). One can calculate, from the same numbers used to calculate the P value, the minimum BF: the strongest evidence against the null hypothesis, using the best supported hypothesis (the observed association) as the alternative-hypothesis . One can also calculate the PPV of a statistically significant finding using the prior probability of an association, the BF based on power and alpha [BF = αβ/(1 − α)(1 − β)], in addition to bias (affecting the accuracy of the alpha and also reflecting our estimate of the prior-odds) [12–14]. The PPV is lowered by low study power (smaller studies with small expected effect sizes), low pre-study odds (hypothesis-generating experiments), bias (flexibility in designs, definitions, outcomes, and analytic modes), and number of teams working in the field (hotter scientific fields) . There are some surprising results of Bayesian methods (Table 2).
Empirical considerations: when is a gene-association hypothesis supported?
There is evidence to support the predictions from Bayesian methods in interpreting study results (Table 2). This is particularly so in genetic-association studies where the expected true (when there is a true association) odds ratios for common SNPs with common complex diseases (such as ARDS) is repeatedly found to be 1.1–1.4; this means that studies have low power unless there are >1000 subjects [12, 15]. This empirical evidence (Table 2) suggests that Bayesian methods, which keep statistical evidence (conveyed traditionally by the P value and more usefully by the BF) distinct from inductive inferences about hypotheses, are useful because they incorporate data external to the study (estimation of priors) in order to arrive at a conclusion about a hypothesis (posterior probability of the probed association being true) [9–12].
Interpreting ARDS gene-association studies
Using the growing cohort of patients, six ARDS gene-association studies have been published by this group (Table 1) [2–7]. These reports were well done according to reporting guidelines . We ask three questions to improve interpretation of these (and, in general, future) gene-association studies in critical care.
Priors: how likely is an association to be expected given information external to the study? Considerations are listed in Table 3. In gene-association studies for complex diseases, the prior is usually in the range of 0.001 (SNPs with only limited prior evidence) to 0.1 (SNPs that already show fairly compelling evidence for association), and in non-replication studies is likely closer to 0.001 .
Minimum BF and PPV: what does the evidence from the study show us? Considerations are listed in Table 3. Assuming a P value threshold 0.01 to 0.001, for a prior of 0.01 the PPV is (assuming power of 0.5) 38–82%, and for a prior of 0.001, 8–40% .
Bias: do we need to modify these estimates for study bias? Considerations are listed in Table 3. If bias is low (0.05–0.2 = “the proportion of probed analyses that would not have been ‘research findings’, but nevertheless end up presented and reported as such, because of bias”), power is 50%, and prior is very high (0.1), the PPV that a statistically significant finding is true is 35–55% .
The observed evidence (the P value, or better yet, the BF) can be combined with prior considerations of plausibility to determine how well two hypotheses are supported (posterior probability, PPV). The posterior probability (PPV) that there is an association between exploratory SNPs and severity of ARDS in children is low given the low prior probability, the modest BF (reflected in modest P values and power), and potential for bias. This is not necessarily a problem if our interest is in generating hypotheses for further scientific study . An interesting hypothesis has been suggested (i.e., a gene association) and warrants further investigation; we should wait for replication in additional larger studies before accepting this hypothesis. These future studies will have a prior probability that is closer to 0.1 (the posterior probability after the current study), and thus replication would move us much further toward accepting the hypothesis [13–15]. Overall, caution is warranted: most genetic associations for ARDS in adults have not replicated .
Acute respiratory distress syndrome
Elav-like family member 2
Cystic fibrosis transmembrane conductance regulator
Positive predictive value
Single nucleotide polymorphism
T-cell intracellular antigen-1
Sapru A, Flori H, Quasney MW, Dahmer MK, for the Pediatric Acute Lung Injury Consensus Conference Group. Pathobiology of acute respiratory distress syndrome. Pediatr Crit Care Med. 2015;16:S6–S22.
Perez-Marques F, Simpson P, Yan K, Quasney MW, Halligan N, Merchant D, Dahmer MK. Association of polymorphisms in genes of factors involved in regulation of splicing of cystic fibrosis transmembrane conductance regulator mRNA with acute respiratory distress syndrome in children with pneumonia. Crit Care. 2016;20:281.
Baughn JM, Quasney MW, Simpson P, Merchant D, Li S, Levy H, Dahmer MK. Association of cystic fibrosis transmembrane conductance regulator gene variants with acute lung injury in African American children with pneumonia. Crit Care Med. 2012;40:3042–9.
Patwari PP, O’Cain P, Goodman DM, Smith M, Krushkal J, Liu C, Somes G, Quasney MW, Dahmer MK. Interleukin-1 receptor antagonist intron 2 variable number of tandem repeats polymorphism and respiratory failure in children with community-acquired pneumonia. Pediatr Crit Care Med. 2008;9:553–9.
Russell R, Quasney MW, Halligan N, Li S, Simpson P, Waterer G, Wunderink RG, Dahmer MK. Genetic variation in MYLK and lung injury in children and adults with community-acquired pneumonia. Pediatr Crit Care Med. 2010;11:731–6.
Dahmer MK, O’Cain P, Patwari PP, Simpson P, Li S, Halligan N, Quasney MW. The influence of genetic variation in surfactant protein B on severe lung injury in African American children. Crit Care Med. 2011;39:1138–44.
Chen J, Wilson ES, Dahmer MK, Quasney MW, Waterer GW, Feldman C, Wunderink RG. Lack of association of the caspase-12 long allele with community-acquired pneumonia in people of African descent. PLoS One. 2014;9(2):389194.
Tejera P, Meyer NJ, Chen F, Feng R, Zhao Y, O’Mahony DS, et al. Distinct and replicable genetic risk factors for acute respiratory distress syndrome of pulmonary or extrapulmonary origin. J Med Genet. 2010;49:671–80.
Goodman SN. Toward evidence-based medical statistics. 1: the p value fallacy. Ann Intern Med. 1999;130:995–1004.
Greenland S, Senn SJ, Rothman KJ, Carlin BJ, Poole C, Goodman SN, Altman DG. Statistical tests, P-values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337.
Goodman SN. Toward evidence-based medical statistics. 2: the Bayes factor. Ann Intern Med. 1999;130:1005–13.
Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124.
Ioannidis JPA, Tarone R, McLaughlin JK. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology. 2011;22:450–6.
Broer L, Lill CM, Schuur M, Amin N, Roehr JT, Bertram L, et al. Distinguishing true from false positives in genomic studies: p values. Eur J Epidemiol. 2013;28:131–8.
Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, et al. STrengthening the Reporting of Genetic Association Studies (STREGA)--an extension of the STROBE statement. PLoS Med. 2009;6(2):31000022.
There was no funding for this work.
Availability of data and materials
SR and ARJ made substantial contributions to conception and design and interpretation of data; participated sufficiently in the work to take public responsibility for the content and agreed to be accountable for all aspects of the work. ARJ wrote the first draft of the manuscript. SR revised the manuscript critically for important intellectual content. SR and ARJ have given final approval of the version to be published.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
See related research by Perez-Marques et al., https://ccforum.biomedcentral.com/articles/10.1186/s13054-016-1454-7 This comment refers to the article available at: http://dx.doi.org/10.1186/s13054-016-1454-7.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Rimpau, S., Joffe, A.R. Interpretation of gene associations with risk of acute respiratory distress syndrome: P values, Bayes factors, positive predictive values, and need for replication. Crit Care 20, 402 (2016). https://doi.org/10.1186/s13054-016-1550-8
- Acute respiratory distress syndrome
- Bayes factor
- Gene association
- P values