Statistics review 8: Qualitative data – tests of association
 Viv Bewick^{1}Email author,
 Liz Cheek^{1} and
 Jonathan Ball^{2}
DOI: 10.1186/cc2428
© BioMed Central Ltd 2004
Published: 30 December 2003
Abstract
This review introduces methods for investigating relationships between two qualitative (categorical) variables. The χ^{2} test of association is described, together with the modifications needed for small samples. The test for trend, in which at least one of the variables is ordinal, is also outlined. Risk measurement is discussed. The calculation of confidence intervals for proportions and differences between proportions are described. Situations in which samples are matched are considered.
Keywords
χ^{2} test of association Fisher's exact test McNemar's test odds ratio risk ratio Yates' correctionIntroduction
In the previous statistics reviews most of the procedures discussed are appropriate for quantitative measurements. However, qualitative, or categorical, data are frequently collected in medical investigations. For example, variables assessed might include sex, blood group, classification of disease, or whether the patient survived. Categorical variables may also comprise grouped quantitative variables, for example age could be grouped into 'under 20 years', '20–50 years' and 'over 50 years'. Some categorical variables may be ordinal, that is the data arising can be ordered. Age group is an example of an ordinal categorical variable.
Numbers of patients classified by site of central venous cannula and infectious complication
Infectious complication  

Central line site  None  Exit Site  Bacteraemia/Septicaemia  Total 
Internal jugular  686  152  96  934 
Subclavian  451  35  38  524 
Femoral  168  58  22  248 
Total  1305  245  156  1706 
χ^{2} test of association
In order to test whether there is an association between two categorical variables, we calculate the number of individuals we would get in each cell of the contingency table if the proportions in each category of one variable remained the same regardless of the categories of the other variable. These values are the frequencies we would expect under the null hypothesis that there is no association between the variables, and they are called the expected frequencies. For the data in Table 1, the proportions of patients in the sample with cannulae sited at the internal jugular, subclavian and femoral veins are 934/1706, 524/1706, 248/1706, respectively. There are 1305 patients with no infectious complications. So the frequency we would expect in the internal jugular site category is 1305 × (934/1706) = 714.5. Similarly for the subclavian and femoral sites we would expect frequencies of 1305 × (524/1706) = 400.8 and 1305 × (248/1706) = 189.7.
We repeat these calculations for the patients with infections at the exit site and with bacteraemia/septicaemia to obtain the following:
Exit site: 245 × (934/1706) = 134.1, 245 × (524/1706) = 75.3, 245 × 248/1706 = 35.6
Bacteraemia/septicaemia: 156 × (934/1706) = 85.4, 156 × (524/1706) = 47.9, 156 × (248/1706) = 22.7
Numbers of patients expected in each classification if there were no association between site of central venous cannula and infectious complication
Infectious complication  

Central line site  None  Exit Site  Bacteraemia/Septicaemia  Total 
Internal jugular  714.5  134.1  85.4  934 
Subclavian  400.8  75.3  47.9  524 
Femoral  189.7  35.6  22.7  248 
Total  1305  245  156  1706 
The test of association involves calculating the differences between the observed and expected frequencies. If the differences are large, then this suggests that there is an association between one variable and the other. The difference for each cell of the table is scaled according to the expected frequency in the cell. The calculated test statistic for a table with r rows and c columns is given by:
where O_{ij} is the observed frequency and E_{ij} is the expectedfrequency in the cell in row i and column j. If the null hypothesis of no association is true, then the calculated test statistic approximately follows a χ^{2} distribution with (r  1) × (c  1) degrees of freedom (where r is the number of rows and c the number of columns). This approximation can be used to obtain a P value.
For the data in Table 1, the test statistic is:
1.134 + 2.380 + 1.314 + 6.279 + 21.531 + 2.052 + 2.484 + 14.069 + 0.020 = 51.26
Percentage points of the χ^{2} distribution produced on a spreadsheet
χ^{2} values for the probabilities (P)  

Degrees of freedom  0.1  0.05  0.01  0.001 
1  2.71  3.84  6.63  10.83 
2  4.61  5.99  9.21  13.82 
3  6.25  7.81  11.34  16.27 
4  7.78  9.49  13.28  18.47 
5  9.24  11.07  15.09  20.52 
6  10.64  12.59  16.81  22.46 
7  12.02  14.07  18.48  24.32 
8  13.36  15.51  20.09  26.12 
9  14.68  16.92  21.67  27.88 
10  15.99  18.31  23.21  29.59 
11  17.28  19.68  24.72  31.26 
12  18.55  21.03  26.22  32.91 
13  19.81  22.36  27.69  34.53 
14  21.06  23.68  29.14  36.12 
15  22.31  25.00  30.58  37.70 
16  23.54  26.30  32.00  39.25 
17  24.77  27.59  33.41  40.79 
18  25.99  28.87  34.81  42.31 
19  27.20  30.14  36.19  43.82 
20  28.41  31.41  37.57  45.31 
25  34.38  37.65  44.31  52.62 
Residuals
The χ^{2} test indicates whether there is an association between two categorical variables. However, unlike the correlation coefficient between two quantitative variables (see Statistics review 7 [1]), it does not in itself give an indication of the strength of the association. In order to describe the association more fully, it is necessary to identify the cells that have large differences between the observed and expected frequencies. These differences are referred to as residuals, and they can be standardized and adjusted to follow a Normal distribution with mean 0 and standard deviation 1 [2]. The adjusted standardized residuals, d_{ij}, are given by:
Where n_{i}. is the total frequency for row i, n._{j} is the total frequency for column j, and N is the overall total frequency. In the example, the adjusted standardized residual for those with cannulae sited at the internal jugular and no infectious complications is calculated as:
The adjusted standardized residuals
Infectious complication  

Central line site  None  Exit Site  Bacteraemia/Septicaemia 
Internal jugular  3.3  2.5  1.8 
Subclavian  6.2  6.0  1.8 
Femoral  3.5  4.4  0.2 
Two by two tables
Data on patients with acute myocardial infarction who took part in a trial of intravenous nitrate
Outcome  Treatment  Control  Total 

Died  3  8  11 
Survived  47  37  84 
Total  50  45  95 
Fisher's exact test
Tables with the same row and column totals as Table 5
(i)  (ii)  (iii)  (iv)  

Outcome  Treatment  Control  Treatment  Control  Treatment  Control  Treatment  Control 
Died  3  8  2  9  1  10  0  11 
Survived  47  37  48  36  49  35  50  34 
To calculate the probability of obtaining a particular table, we consider the total number of possible tables with the given marginal totals, and the number of ways we could have obtained the particular cell frequencies in the table in question. The number of ways the row totals of 11 and 84 could have been obtained given 95 patients altogether is denoted by _{95}C_{11} and is equal to 95!/11!84!, where 95! ('95 factorial') is the product of 95 and all the integers lower than itself down to 1. Similarly the number of ways the column totals of 50 and 45 could have been obtained is given by _{95}C_{50} = 95!/50!45!. Assuming independence, the total number of possible tables with the given marginal totals is:
The number of ways Table 5 (Table 6[i]) could have been obtained is given by considering the number of ways each cell frequency could have arisen. There are _{95}C_{3} ways of obtaining the three patients in the first cell. The eight patients in the next cell can be obtained in _{92}C_{8} ways from the 95  3 = 92 remaining patients. The remaining cells can be obtained in _{84}C_{47} and _{37}C_{37} (= 1) ways. Therefore, the number of ways of obtaining Table 6(i) under the null hypothesis is:
Therefore the probability of obtaining 6(i) is:
Therefore the total probability of obtaining the four tables given in Table 6 is:
This probability is usually doubled to give a twosided P value of 0.140. There is quite a large discrepancy in this case between the χ^{2} test and Fisher's exact test.
Yates' continuity correction
Adjusted frequencies for Yates' correction
Outcome  Treatment  Control  Total 

Died  3.5  7.5  11 
Survived  46.5  37.5  84 
Total  50  45  95 
The χ^{2} test using these adjusted figures gives a test statistic of 2.162 with a P value of 0.141, which is close to the P value for Fisher's exact test.
For large samples the three tests – χ^{2}, Fisher's and Yates' – give very similar results, but for smaller samples Fisher's test and Yates' correction give more conservative results than the χ^{2} test; that is the P values are larger, and we are less likely to conclude that there is an association between the variables. There is some controversy about which method is preferable for smaller samples, but Bland [5] recommends the use of Fisher's or Yates' test for a more cautious approach.
Test for trend
Number of patients according to AVPU and survival
Outcome  Alert  Voice or pain responsive  Unresponsive  Total 

Survived  1110 (91.1%)  54 (79.4%)  14 (70%)  1178 
Died  108 (8.9%)  14 (20.6%)  6 (30%)  128 
Total  1218 (100%)  68 (100%)  20 (100%)  1306 
Because the categories of AVPU have a natural ordering, it is appropriate to ask whether there is a trend in the proportion dying over the levels of AVPU. This can be tested by carrying out similar calculations to those used in regression for testing the gradient of a line (see Statistics review 7 [1]). Suppose the variable 'survival' is regarded as the y variable taking two values, 1 and 2 (survived and died), and AVPU as the x variable taking three values, 1, 2 and 3. We then have six pairs of x, y values, each occurring the number of times equal to the frequency in the table; for example, we have 1110 occurrences of the point (1,1).
Following the lines of the test of the gradient in regression, with some fairly minor modifications and using large sample approximations, we obtain a χ^{2} statistic with 1 degree of freedom given by [5]:
For the data in Table 8, we obtain a test statistic of 19.33 with 1 degree of freedom and a P value of less than 0.001. Therefore, the trend is highly significant. The difference between the χ^{2} test statistic for trend and the χ^{2} test statistic in the original test is 19.38  19.33 = 0.05 with 2  1 = 1 degree of freedom, which provides a test of the departure from the trend. This departure is very insignificant and suggests that the association between survival and AVPU classification can be explained almost entirely by the trend.
Some computer packages give the trend test, or a variation. The trend test described above is sometimes called the Cochran–Armitage test, and a common variation is the Mantel–Haentzel trend test.
Measurement of risk
Outcomes of the study conducted by Rivers and coworkers
Outcome  

Therapy  Died  Survived  Total 
Early goaldirected  38  79  117 
Standard  59  60  119 
Total  97  139  236 
From the table it can be seen that the proportion of patients receiving early goaldirected therapy who died is 38/117 = 32.5%, and so this is the risk for death with early goaldirected therapy. The risk for death on the standard therapy is 59/119 = 49.6%.
Another measurement of the association between a disease and possible risk factor is the odds. This is the ratio of those exposed to the risk factor who develop the disease compared with those exposed to the risk factor who do not develop the disease. This is best illustrated by a simple example. If a bag contains 8 red balls and 2 green balls, then the probability (risk) of drawing a red ball is 8/10 whereas the odds of drawing a red ball is 8/2. As can be seen, the measurement of odds, unlike risk, is not confined to the range 0–1. In the study conducted by Rivers and coworkers [6] the odds of death with early goaldirected therapy is 38/79 = 0.48, and on the standard therapy it is 59/60 = 0.98.
Confidence interval for a proportion
As the measurement of risk is simply a proportion, the confidence interval for the population measurement of risk can be calculated as for any proportion. If the number of individuals in a random sample of size n who experience a particular outcome is r, then r/n is the sample proportion, p. For large samples the distribution of p can be considered to be approximately Normal, with a standard error of [2]:
The 95% confidence interval for the true population proportion, p, is given by p  1.96 × standard error to p + 1.96 × standard error, which is:
where p is the sample proportion and n is the sample size. The sample proportion is the risk and the sample size is the total number exposed to the risk factor.
For the study conducted by Rivers and coworkers [6] the 95% confidence interval for the risk for death on early goaldirected therapy is 0.325 ± 1.96(0.325 [10.325]/117)^{0.5} or (24.0%, 41.0%), and on the standard therapy it is (40.6%, 58.6%). The interpretation of a confidence interval is described in (see Statistics review 2 [3]) and indicates that, for those on early goaldirected therapy, the true population risk for death is likely to be between 24.0% and 41.0%, and that for the standard therapy between 40.6% and 58.6%.
Comparing risks
To assess the importance of the risk factor, it is necessary to compare the risk for developing a disease in the exposed group with the risk in the nonexposed group. In the study by Rivers and coworkers [6] the risk for death on the early goaldirected therapy is 32.5%, whereas on the standard therapy it is 49.6%. A comparison between the two risks can be made by examining either their ratio or the difference between them.
Risk ratio
The risk ratio measures the increased risk for developing a disease when having been exposed to a risk factor compared with not having been exposed to the risk factor. It is given by RR = risk for the exposed/risk for the unexposed, and it is often referred to as the relative risk. The interpretation of a relative risk is described in Statistics review 6 [7]. For the Rivers study the relative risk = 0.325/0.496 = 0.66, which indicates that a patient on the early goaldirected therapy is 34% less likely to die than a patient on the standard therapy.
The calculation of the 95% confidence interval for the relative risk [8] will be covered in a future review, but it can usefully be interpreted here. For the Rivers study the 95% confidence interval for the population relative risk is 0.48 to 0.90. Because the interval does not contain 1.0 and the upper end is below, it indicates that patients on the early goaldirected therapy have a significantly decreased risk for dying as compared with those on the standard therapy.
Odds ratio
When quantifying the risk for developing a disease, the ratio of the odds can also be used as a measurement of comparison between those exposed and not exposed to a risk factor. It is given by OR = odds for the exposed/odds for the unexposed, and is referred to as the odds ratio. The interpretation of odds ratio is described in Statistics review 3 [4]. For the Rivers study the odds ratio = 0.48/0.98 = 0.49, again indicating that those on the early goaldirected therapy have a reduced risk for dying as compared with those on the standard therapy. This will be covered fully in a future review.
The calculation of the 95% confidence interval for the odds ratio [2] will also be covered in a future review but, as with relative risk, it can usefully be interpreted here. For the Rivers example the 95% confidence interval for the odds ratio is 0.29 to 0.83. This can be interpreted in the same way as the 95% confidence interval for the relative risk, indicating that those receiving early goaldirected therapy have a reduced risk for dying.
Difference between two proportions
Confidence interval
For the Rivers study, instead of examining the ratio of the risks (the relative risk) we can obtain a confidence interval and carry out a significance test of the difference between the risks. The proportion of those on early goaldirected therapy who died is p_{1} = 38/117 = 0.325 and the proportion of those on standard therapy who died is p_{2} = 59/119 = 0.496. A confidence interval for the difference between the true population proportions is given by:
(p_{1}  p_{2})  1.96 × se(p_{1}  p_{2}) to (p_{1}  p_{2}) + 1.96 × se(p_{1}  p_{2})
Where se(p_{1}  p_{2}) is the standard error of p_{1}  p_{2} and is calculated as:
Thus, the required confidence interval is 0.171  1.96 × 0.063 to 0.171 + 1.96 × 0.063; that is 0.295 to 0.047. Therefore, the difference between the true proportions is likely to be between 0.295 and 0.047, and the risk for those on early goaldirected therapy is less than the risk for those on standard therapy.
Hypothesis test
We can also carry out a hypothesis test of the null hypothesis that the difference between the proportions is 0. This follows similar lines to the calculation of the confidence interval, but under the null hypothesis the standard error of the difference in proportions is given by:
where p is a pooled estimate of the proportion obtained from both samples [5]:
So:
The test statistic is then:
Comparing this value with a standard Normal distribution gives p = 0.007, again suggesting that there is a difference between the two population proportions. In fact, the test described is equivalent to the χ^{2}test of association on the two by two table. The χ^{2} test gives a test statistic of 7.31, which is equal to (2.71)^{2} and has the same P value of 0.007. Again, this suggests that there is a difference between the risks for those receiving early goaldirected therapy and those receiving standard therapy.
Matched samples
Matched pair designs, as discussed in Statistics review 5 [9], can also be used when the outcome is categorical. For example, when comparing two tests to determine a particular condition, the same individuals can be used for each test.
McNemar's test
In this situation, because the χ^{2} test does not take pairing into consideration, a more appropriate test, attributed to McNemar, can be used when comparing these correlated proportions.
The results of two tests to determine the presence of Helicobacter pylori
Breath test  

Oxoid test  +    Total 
+  40  8 (b)  48 
  4 (c)  32  36 
Total  44  40  84(n) 
Where b and c are the frequencies in the two categories of discordant pairs (as shown in Table 10). The calculated test statistic is compared with a χ^{2} distribution with 1 degree of freedom to obtain a P value. For the example b = 8 and c = 4, therefore the test statistic is calculated as 1.33. Comparing this with a χ^{2} distribution gives a P value greater than 0.10, indicating no significant difference in the proportion of positive determinations of H. pylori using the breath and the Oxoid tests.
The test can also be carried out with a continuity correction attributed to Yates [5], in a similar way to that described above for the χ^{2}test of association. The test statistic is then given by:
and again is compared with a χ^{2} distribution with 1 degree of freedom. For the example, the calculated test statistic including the continuity correct is 0.75, giving a P value greater than 0.25.
As with nonpaired proportions a confidence interval for the difference can be calculated. For large samples the difference between the paired proportions can be approximated to a Normal distribution. The difference between the proportions can be calculated from the discordant pairs [8], so the difference is given by (b  c)/n, where n is the total number of pairs, and the standard error of the difference by (b + c)^{0.5}/n.
For the example where b = 8, c = 4 and n = 84, the difference is calculated as 0.048 and the standard error as 0.041. The approximate 95% confidence interval is therefore 0.048 ± 1.96 × 0.041 giving 0.033 to 0.129. As this spans 0, it again indicates that there is no difference in the proportion of positive determinations of H. pylori using the breath and the Oxoid tests.
Limitations
For a χ^{2} test of association, a recommendation on sample size that is commonly used and attributed to Cochran [5] is that no cell in the table should have an expected frequency of less than one, and no more than 20% of the cells should have an expected frequency of less than five. If the expected frequencies are too small then it may be possible to combine categories where it makes sense to do so.
For two by two tables, Yates' correction or Fisher's exact test can be used when the samples are small. Fisher's exact test can also be used for larger tables but the computation can become impossibly lengthy.
In the trend test the individual cell sizes are not important but the overall sample size should be at least 30.
The analyses of proportions and risks described above assume large samples with similar requirement to the χ^{2} test of association [8].
The sample size requirement often specified for McNemar's test and confidence interval is that the number of discordant pairs should be at least 10 [8].
Conclusion
The χ^{2} test of association and other related tests can be used in the analysis of the relationship between categorical variables. Care needs to be taken to ensure that the sample size is adequate.
Box
This article is the eighth in an ongoing, educational review series on medical statistics in critical care.
Previous articles have covered 'presenting and summarizing data', 'samples and populations', 'hypothesestesting and P values', 'sample size calculations', 'comparison of means', 'nonparametric means' and 'correlation and regression'.
Future topics to be covered include:
Chisquared and Fishers exact tests
Analysis of variance
Further nonparametric tests: Kruskal–Wallis and Friedman
Measures of disease: PR/OR
Survival data: Kaplan–Meier curves and log rank tests
ROC curves
Multiple logistic regression.
If there is a medical statistics topic you would like explained, contact us at editorial@ccforum.com.
Abbreviations
 AVPU:

A = alert, V = voice responsiveness, P = pain responsive and U = unresponsive
Declarations
Authors’ Affiliations
References
 Bewick V, Cheek L, Ball J: Statistics review 7: Correlation and regression. Crit Care 2003, 7: 451459. 10.1186/cc2401PubMedPubMed CentralView ArticleGoogle Scholar
 Everitt BS: The Analysis of Contingency Tables 2 Edition London, UK: Chapman & Hall 1992.Google Scholar
 Whitley E, Ball J: Statistics review 2: samples and populations. Crit Care 2002, 6: 143148. 10.1186/cc1473PubMedPubMed CentralView ArticleGoogle Scholar
 Whitley E, Ball J: Statistics review 3: hypothesis testing and P values. Crit Care 2002, 6: 222225. 10.1186/cc1493PubMedPubMed CentralView ArticleGoogle Scholar
 Bland M: An Introduction to Medical Statistics 3 Edition Oxford, UK: Oxford University Press 2001.Google Scholar
 Rivers E, Nguyen B, Havstad S, Ressler J, Muzzin A, Knoblich B, Peterson E, Tomlanovich M, Early GoalDirected Therapy Collaborative Group: Early goaldirected therapy in the treatment of severe sepsis and septic shock. N Engl J Med 2001, 345: 13681377. 10.1056/NEJMoa010307PubMedView ArticleGoogle Scholar
 Whitley E, Ball J: Statistics review 6: Nonparametric methods. Crit Care 2002, 6: 509513. 10.1186/cc1820PubMedPubMed CentralView ArticleGoogle Scholar
 Kirkwood BR, Sterne JAC: Essential Medical Statistics 2 Edition Oxford, UK: Blackwell Science Ltd 2003.Google Scholar
 Whitley E, Ball J: Statistics review 5: Comparison of means. Crit Care 2002, 6: 424428. 10.1186/cc1548PubMedPubMed CentralView ArticleGoogle Scholar