Statistics review 13: Receiver operating characteristic curves
 Viv Bewick^{1}Email author,
 Liz Cheek^{1} and
 Jonathan Ball^{2}
DOI: 10.1186/cc3000
© BioMed Central Ltd 2004
Published: 4 November 2004
Abstract
This review introduces some commonly used methods for assessing the performance of a diagnostic test. The sensitivity, specificity and likelihood ratio of a test are discussed. The uses of the receiver operating characteristic curve and the area under the curve are explained.
Keywords
AUROC negative likelihood ratio negative predictive value positive likelihood ratio positive predictive value ROC curve sensitivity specificityIntroduction
Number of patients according to level of lactate and mortality
Outcome  

Test  Died  Survived  Total 
Lactate > 1.5 mmol/l  81  591  672 
Lactate ≤ 1.5 mmol/l  45  674  719 
Total  126  1265  1391 
Number of patients according to result of diagnostic test and actual outcome
Outcome  

Test  Positive  Negative 
Positive  True positives  False positives 
Negative  False negatives  True negatives 
Sensitivity and specificity
The sensitivity of a diagnostic test is the proportion of patients for whom the outcome is positive that are correctly identified by the test. The specificity is the proportion of patients for whom the outcome is negative that are correctly identified by the test.
For the data given in Table 1 the sensitivity of the test using lactate level above 1.5 mmol/l as an indicator of mortality is 81/126 = 0.64, and the specificity is 674/1265 = 0.53. Therefore, 64% of patients in this sample who died and 53% who survived were correctly identified by this test. Because both of these measures are simple proportions, their confidence intervals can be calculated as described in Statistics review 8 [1]. The 95% confidence interval for sensitivity is 56–73% and that for specificity is 51–56%.
Generally, both the sensitivity and specificity of a test need to be known in order to assess its usefulness for a diagnosis. A discriminating test would have sensitivity and specificity close to 100%. However, a test with high sensitivity may have low specificity and vice versa. The decision to make use of a diagnostic test will also depend on whether a treatment exists should the result of the test be positive, the cost of such a treatment, and whether the treatment is detrimental in cases in which the result is a false positive.
Positive and negative predictive values
The positive predictive value (PPV) of a test is the probability that a patient has a positive outcome given that they have a positive test result. This is in contrast to sensitivity, which is the probability that a patient has a positive test result given that they have a positive outcome. Similarly, the negative predictive value (NPV) is the probability that a patient has a negative outcome given that they have a negative test result, in contrast to specificity, which is the probability that a patient has a negative test result given that they have a negative outcome.
For the data in Table 1 the PPV of the test using lactate level above 1.5 mmol/l as an indicator of mortality is 81/672 = 0.12, and the NPV is 674/719 = 0.94. Therefore, 12% of patients in the sample whose test results were positive actually died and 94% whose test results were negative survived. The 95% confidence interval for PPV is 10–15% and that for NPV is 92–96%.
Number of patients according to level of lactate and mortality
Outcome  

Test  Died  Survived  Total 
Lactate > 1.5 mmol/l  386  370  756 
Lactate ≤ 1.5 mmol/l  214  421  635 
Total  600  791  1391 
Likelihood ratios
Sensitivity and specificity are usefully combined in likelihood ratios. The likelihood ratio of a positive test result (LR^{+}) is the ratio of the probability of a positive test result if the outcome is positive (true positive) to the probability of a positive test result if the outcome is negative (false positive). It can be expressed as follows:
LR^{+} represents the increase in odds favouring the outcome given a positive test result. For the data in Table 1, LR^{+} is 0.64/(1  0.53) = 1.36. This indicates that a positive result is 1.36 times as likely for a patient who died as for one who survived.
The pretest probability of a positive outcome is the prevalence of the outcome. The pretest odds [1] can be used to calculate the posttest probability of outcome and are given by:
Applying Bayes' theorem [2], we have:
Posttest odds for the outcome given a positive test result = pretest odds × LR^{+}
For the data given in Table 1, the prevalence of death = 126/1391 = 0.09 and the pretest odds of death = 0.09/(1  0.09) = 0.099. Therefore:
Posttest odds of death given a positive test result = 0.099 × 1.36 = 0.135
For a simpler interpretation, these odds can be converted to a probability using the following:
For the data in Table 1 this gives a probability = 0.135/(1 + 0.135) = 0.12. This is the probability of death given a positive test result (i.e. the PPV).
Similarly, we can define LR^{} as the ratio of the probability of a negative test result if the outcome is positive to the probability of a negative test result if the outcome is negative. It can be expressed as follows:
LR^{} represents the increase in odds favouring the outcome given a negative test result. For the data given in Table 1, LR^{} is (1  0.64)/0.53 = 0.68. This indicates that a negative result is 0.68 times as likely for a patient who died as for one who survived. Applying Bayes' theorem, we have the following:
Posttest odds for the outcome given a negative test result = pretest odds × LR^{}
For the data in Table 1:
Posttest odds of death given a negative test result = 0.099 × 0.68 = 0.067
Converting these odds to a probability gives 0.067/(1 + 0.067) = 0.06. This is the probability of death given a negative test result (i.e. 1  NPV). Therefore, NPV = 1  0.06 = 0.94, as shown above.
A high likelihood ratio for a positive result or a low likelihood ratio for a negative result (close to zero) indicates that a test is useful. As previously stated, a greater prevalence will raise the probability of a positive outcome given either a positive or a negative test result.
Youden's index
Number of patients according to level of lactate, using a range of cutoff values, and mortality plus sensitivities and specificities
Lactate (mmol/l)  Died  Survived  Sensitivity  Specificity  Youden's index (J)  1 – specificity 

>0  126  1265  1  0  0  1 
>1  114  996  0.90  0.21  0.12  0.79 
>1.5  81  591  0.64  0.53  0.18  0.47 
>2  58  329  0.46  0.74  0.20  0.26 
>3  37  131  0.29  0.90  0.19  0.10 
>5  19  27  0.15  0.98  0.13  0.02 
>25  0  0  0  1  0  0 
Number in sample  126  1265 
It is desirable to choose a test that has high values for both sensitivity and specificity. In practice, the sensitivity and specificity may not be regarded as equally important. For example, a falsenegative finding may be more critical than a falsepositive one, in which case a cutoff with a relatively high specificity would be chosen. However, if no judgement is made between the two, then Youden's index (J) may be used to choose an appropriate cutoff:
J = sensitivity + specificity  1
The maximum value J can attain is 1, when the test is perfect, and the minimum value is usually 0, when the test has no diagnostic value. From Table 4, the best cutoff value for lactate using Youden's index is 2 mmol/l, with J = 0.20
Receiver operating characteristic curve and area under the curve
A perfect test would have sensitivity and specificity both equal to 1. If a cutoff value existed to produce such a test, then the sensitivity would be 1 for any nonzero values of 1 – specificity. The ROC curve would start at the origin (0,0), go vertically up the yaxis to (0,1) and then horizontally across to (1,1). A good test would be somewhere close to this ideal.
If a variable has no diagnostic capability, then a test based on that variable would be equally likely to produce a false positive or a true positive:
Sensitivity = 1 – specificity, or
Sensitivity + specificity = 1
This equality is represented by a diagonal line from (0,0) to (1,1) on the graph of the ROC curve, as shown in Fig. 1 (dashed line).
Figure 1 suggests that lactate does not provide a very good indication of mortality but that it is better than a random guess.
Area under the receiver operating characteristic curve (AUROC) for lactate
95% Confidence interval  

AUROC  Standard error  P  Lower bound  Upper bound 
0.640  0.027  0.000  0.586  0.693 
Table 5 also includes the results of a hypothesis test of whether the AUROC is greater than 0.5, that is, whether using lactate to diagnose mortality is better than chance alone. The P value is less than 0.001 and the confidence interval for AUROC is 0.59–0.69, suggesting that lactate level does help to predict mortality. This procedure is equivalent to testing whether the lactate levels for those who died are generally higher than for those who survived, and therefore the Mann–Whitney test [3] can be used, resulting in the same P value.
Choosing between diagnostic tests
Area under the receiver operating characteristic curve (AUROC) for lactate and urea
95% Confidence interval  

Test result variables  AUROC  Standard error  P  Lower bound  Upper bound 
Lactate (mmol/l)  0.640  0.027  0.000  0.586  0.693 
Urea (mmol/l)  0.730  0.023  0.000  0.684  0.775 
Assumptions and limitations
Sensitivity and specificity may not be invariant for a diagnostic test but may depend on characteristics of the population, for example age profile or severity of disease.
The decision to use a diagnostic test depends not only on the ROC analysis but also on the ultimate benefit to the patient. The prevalence of the outcome, which is the pretest probability, must also be known.
Generally, there is a tradeoff between sensitivity and specificity, and the practitioner must make a decision based on their relative importance.
Conclusion
ROC analysis provides a useful means to assess the diagnostic accuracy of a test and to compare the performance of more than one test for the same outcome. However, the usefulness of the test must be considered in the light of the clinical circumstances.
Abbreviations
 AUROC:

area under the receiver operating characteristic curve
 PLR:

positive likelihood ratio
 NLR:

negative likelihood ratio
 NPV:

negative predictive value
 PPV:

positive predictive value
 ROC:

receiver operating characteristic.
Declarations
Authors’ Affiliations
References
 Bewick V, Cheek L, Ball J: Statistics review 8: Qualitative data – tests of association. Crit Care 2004, 8: 4653. 10.1186/cc2428PubMed CentralView ArticlePubMedGoogle Scholar
 Petrie A, Sabin C: Medical Statistics at a Glance. Oxford, UK: Blackwell; 2000.Google Scholar
 Whitley E, Ball J: Statistics review 6: Nonparametric methods. Crit Care 2002, 6: 509513. 10.1186/cc1820PubMed CentralView ArticlePubMedGoogle Scholar
 Campbell MJ, Machin D: Medical Statistics: A Commonsense Approach. 3rd edition. Chichester, UK: Wiley; 1999.Google Scholar
 Hanley JA, McNeil BJ: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148: 839843.View ArticlePubMedGoogle Scholar