Use of explicit ICD9-CM codes to identify adult severe sepsis: impacts on epidemiological estimates

Background Severe sepsis is a challenge for healthcare systems, and epidemiological studies are essential to assess its burden and trends. However, there is no consensus on which coding strategy should be used to reliably identify severe sepsis. This study assesses the use of explicit codes to define severe sepsis and the impacts of this on the incidence and in-hospital mortality rates. Methods We examined episodes of severe sepsis in adults aged ≥18 years registered in the 2006–2011 national hospital discharge database, identified in an exclusive manner by two ICD-9-CM coding strategies: (1) those assigned explicit ICD-9-CM codes (995.92, 785.52); and (2) those assigned combined ICD-9-CM infection and organ dysfunction codes according to modified Martin criteria. The coding strategies were compared in terms of the populations they defined and their relative implementation. Trends were assessed using Joinpoint regression models and expressed as annual percentage change (APC). Results Of 222 846 episodes of severe sepsis identified, 138 517 (62.2 %) were assigned explicit codes and 84 329 (37.8 %) combination codes; incidence rates were 60.6 and 36.9 cases per 100 000 inhabitants, respectively. Despite similar demographic characteristics, cases identified by explicit codes involved fewer comorbidities, fewer registered pathogens, greater extent of organ dysfunction (two or more organs affected in 60 % versus 26 % of cases) and higher in-hospital mortality (54.5 % versus 29 %; risk ratio 1.86, 95 % CI 1.83, 1.88). Between 2006 and 2011, explicit codes were increasingly implemented. Standardised incidence rates in this cohort increased over time with an APC of 12.3 % (95 % CI 4.4, 20.8); in the combination code cohort, rates increased by 3.8 % (95 % CI 1.3, 6.3). A decreasing trend in mortality was observed in both cohorts though the APC was −8.1 % (95 % CI −10.4, −5.7) in the combination code cohort and −3.5 % (95 % CI −3.9, −3.2) in the explicit code cohort. Conclusions Our findings suggest greater and increasing use of explicit codes for adult severe sepsis in Spain. This trend will have substantial impacts on epidemiological estimates, because these codes capture cases featuring greater organ dysfunction and in-hospital mortality. Electronic supplementary material The online version of this article (doi:10.1186/s13054-016-1497-9) contains supplementary material, which is available to authorized users.

Owing to the difficulty in prospectively identifying cases at the population scale [8][9][10][11], population-based studies of severe sepsis have been based on the International Classification of Diseases, Ninth Edition, Clinical Modification (ICD-9-CM). However, reported incidence estimates of severe sepsis and related hospital mortality vary widely from 13 to 300 cases per 100,000 inhabitants and from 28 % to 50 %, respectively [1,2,4,9]. Among other factors, this disparity seems to be determined by biases introduced by the different strategies used to identify cases [9,12,13].
Following the definition in 1991 of severe sepsis as sepsis associated with acute organ dysfunction [14], population estimates of the incidence of severe sepsis and its associated mortality have been based on combination algorithms of ICD-9-CM infection and organ dysfunction codes [15,16]. Recognizing the limitations of such algorithms, sometimes described as excessively inclusive and with scarce specificity to represent real cases of severe sepsis [10,16,17], in 2002, a set of explicit ICD-9-CM codes was issued [18]. However, despite this coding system having been developed more than 10 years ago, and the specificity being close to 100 % [19], studies conducted in the USA indicate that the use of this strategy is scarce and is restricted to patients with more severe sepsis [13]. In effect, both for populationbased and other research studies, combination codes are most widely used [9,13]. However, it is not known if this practice is generalized because no European populationbased studies have addressed the use of explicit severe sepsis codes, and the impacts of the given coding system used on epidemiological incidence and mortality estimates are unknown.
The present study sought to identify cases of severe sepsis captured by explicit and combination codes from a national registry, and compare these two coding strategies in terms of: (1) their implementation to identify severe sepsis in adults in Spain and their impacts on incidence estimates and (2) their definition of a given profile of patient demographics, clinical characteristics and hospital outcomes.

Design and data sources
We performed a retrospective study using the official clinical-administrative database designated National Minimum Basic Data Set (MBDS) of the Spanish Ministry of Health, Social Services and Equality (MSSSI). In the Spanish national health system, when a patient is discharged from hospital, the responsible physician is required by law to record all diagnoses and clinical procedures performed using the ICD-9-CM system. This information is compiled in the MBDS database. This database is considered to be representative of the national population as it includes data on over 90 % of all hospitalizations in the country annually [19,20].
In the MBDS, each hospitalization is treated as a specific record and includes information on patient demographics, type of admission, date of admission, date of discharge, destiny upon discharge, along with diagnosis codes including the principal diagnosis, 13 secondary diagnoses and up to 20 procedures performed during hospitalization. Hospitalization data for the study period were obtained from the MSSSI [19] and data for the general population were obtained from the National Statistics Institute (Instituto Nacional de Estadística) [21].

Study population: case identification and definitions
We identified all hospitalizations of adult patients (≥18 years) with severe sepsis from 1 January 2006 to 31 December 2011. To capture all cases, we used two established ICD-9-CM diagnostic coding strategies [22][23][24], generating two cohorts of longitudinal data. The first strategy was based on explicit ICD-9-CM codes (995.92 for severe sepsis, 785.52 for septic shock) [9,13,18,22] and the second on ICD-9-CM infection and organ dysfunction combination codes according to modified Martin criteria. This second strategy is detailed in Additional file 1: Table S1. In addition, the codes are given for septicaemia, fungaemia and bacteraemia as described by Martin et al. [16,22]. This strategy includes the ICD-9-CM code for sepsis (995.91) introduced in Spain in 2004. The two sets of codes were assigned in a mutually exclusive fashion.
The codes defining organ dysfunction are provided in Additional file 2: Table S2. The choice of this combination strategy was based on studies indicating its capacity to estimate the burden of severe sepsis [23] and the fact that it has been used in a previous study by our group [24]. This provided us with two cohorts of longitudinal data designated the explicit codes cohort and the combination codes cohort.
We assessed the registry data for the 13 secondary diagnosis fields using the version of the Charlson index validated by Deyo for ICD-9-CM [25], to assess comorbidities [26]. This index includes specific comorbid conditions of known prognostic value, which are classified using ICD-9 codes from prior outpatient and in-patient codes. Prior epidemiological studies in sepsis have shown that there is no overlap between the codes used to calculate this index and the diagnostic codes used to capture acute organ dysfunction [27], and that it is useful in assessing the risk of death in septic patients [28]. For identification of specific microorganisms, code 041 was included as indicated by the ICD-9-CM coding manual for the purpose of identifying bacterial agents in the case of diseases classified under the heading "other" [20].

Ethics
All data were anonymized. According to Spanish legislation the use of these data is exempt of the need for informed consent [29].

Data analysis
A descriptive comparative analysis was performed to compare the use of explicit and combination coding practices for severe sepsis, including data on patient demographics, comorbidities, organ dysfunction and in-hospital mortality. In addition, in the explicit codes cohort, differentiation was made between hospitalizations coded 785.52 (septic shock) and those coded 995.92 (severe sepsis).
The Charlson-Deyo index was calculated according to the improved STATA 14 package, both as a continuous variable and as a categorical variable with four groups (score 0, 1-2, 3-4 and >4) of increasing severity and impact on outcome [30]. In-hospital mortality was calculated as the number of deaths divided by the number of cases of severe sepsis in each cohort and expressed as a percentage.
Qualitative variables were expressed as absolute frequencies and percentages and quantitative variables as means and standard deviations. Association between qualitative variables was assessed using Pearson's chisquared test or Fisher's exact test, and between quantitative and qualitative variables using Student's t test. We used the risk ratio (RR) with its respective 95 % confidence interval to quantify differences in demographic and clinical data between the cohorts.
Incidence rates were estimated using national data for subjects aged ≥18 years expressed as results per 100,000 inhabitants. Age-adjusted incidence and in-hospital mortality rates were calculated by the direct standardization method using the year 2008 as reference. To identify trends in incidence and in-hospital mortality rates, we quantified the annual percentage change (APC) with its respective 95 % confidence interval, using log-linear regression models assuming a standard Poisson distribution [24,31]. This procedure serves to determine whether an apparent change in trend is statistically significant using a Monte Carlo permutation method [31]. All statistical tests were performed using the programmes STATA

General characteristics
Both the explicit code and combination code cohorts predominantly comprised men (Table 1) and mean age was similar (71 years, p = 0.120). In contrast, hospitalizations coded using explicit codes comprised a lower comorbidity burden (mean Charlson index 2 vs. 2.2, p < 0.001) and a lower percentage of cases in the categories of greater severity. Further, with the exception of neoplasms, the frequencies of each of the comorbidities included in the Charlson index were also lower in this cohort than in the combination code cohort. There were more surgical cases among the patients assigned explicit codes.
Microorganisms were recorded in a significantly lower proportion of cases in the explicit code cohort. Gramnegative bacteria were the most frequently reported. As potential sources of sepsis, the abdomen, respiratory tract and soft tissues were significantly more frequently recorded in this cohort than in the combination code cohort. The possible source was not specified in 38 % of the explicit code cohort and in 23 % of the combination code cohort.
The extent of organ dysfunction differed significantly between the two cohorts (Table 1). Thus, in the explicit code group, a third of all episodes featured the dysfunction of one, two, or more than two organs, while in the combination code cohort, 74 % of cases involved singleorgan dysfunction. When comparing affected organs, cardiovascular, respiratory, kidney, haematological and metabolic dysfunction were significantly more frequently recorded in the explicit code cohort, as was the use of invasive mechanical ventilation. It should be noted that no data were available on the number or type of organ dysfunction in 7 % of cases in the explicit code cohort. Table 2 shows the characteristics of the cases recorded in the explicit code cohort according to whether they were coded 785.52 (septic shock) or 995.92 (severe sepsis). In this cohort, 67 % of cases were coded 785.52 and corresponded to younger patients with fewer comorbidities and a greater number of organ dysfunctions, i.e., cardiovascular, haematological, metabolic and respiratory, and a greater need for mechanical ventilation support. However, it should be noted that in a large proportion of cases codified as 995.92, data were not available on the number or type of dysfunctional organs. Gram-negative microorganisms and respiratory sources of infection were largely recorded in both groups.

Incidence
The episodes of severe sepsis identified amounted to 1.1 % of all adult hospitalizations over the 6-year period and an overall incidence of 97.5 cases per 100,000 inhabitants. However, crude incidence rates were 36.9 cases per 100 000 inhabitants for the combination codes and 60.6 cases per 100,000 inhabitants for the explicit codes. In this cohort, incidence rates were 19.75 cases per 100,000 inhabitants for code 995.92 (severe sepsis) and 40.85 cases per 100 000 inhabitants for code 785.52 (septic shock). From 2006 to 2011, the overall number of captured cases increased from 25 808 to 46 774, representing an annual increase of 13.5 %. Figure 1 shows the number of cases of severe sepsis identified using explicit and combination codes across the study interval. In the explicit code cohort, adjusted incidence rates (Fig. 2)

Mortality
Overall in-hospital mortality was 45 % (n = 100 253 cases) while 29.4 % was recorded for the combination code cohort and 54.5 % was recorded for the explicit code cohort. In the explicit code cohort, 39 % of deaths (n = 29 310) were produced in critical care units, while this figure was 20 % (n = 4962) in the combination code cohort.
As may be observed in Table 2, in the cohort of explicit codes, mortality rates were similar for cases captured using codes 995.92 or 785.52 (54.3 % vs. 54.6 %).
From 2006 to 2011, the adjusted in-hospital mortality rate had a significantly decreasing trend in both cohorts. However, the combination code cohort dropped, with an APC of −8.1 % (95 % CI −10.4, −5.7 %) while the decline, though significant, was less pronounced in the explicit code cohort with an APC over the study period of −3.5 % (95 % CI −3.9, −3.2). Figure 3 shows the changes detected in each cohort and explicit sub-cohorts. In the code 785.52 group, mortality rates diminished, with an APC of −3.5 % (95 % CI −4.0, -3.0). The 995.92 subcohort had a similar decrease, with an APC of −3.5 % (95 % CI −4.4, -2.6). Figure 4 shows that the trend in the extent of organ dysfunction was stable over the study period in both cohorts. However, we detected an increase in the percentage of cases in which the number of organ dysfunctions was not recorded in the explicit code cohort.

Discussion
The results of this study reveal the elevated use of explicit codes to define severe sepsis in Spain and an upward  Compared to combination severe sepsis codes, these ICD-9-CM codes captured a case profile featuring greater organ dysfunction, healthcare effort and inhospital mortality. These differences will have an enormous impact on estimates of disease burden. Our findings also indicate that cases coded with explicit 785.52 (septic shock) or 995.92 (severe sepsis) codes, though having different characteristics, had similar outcomes and practically the same in-hospital mortality.
Our findings are inconsistent with recent reports from the USA such as that by Gaieski et al. [9], who in a retrospective study based on nationwide in-patient data for 2004 to 2009, found that only a minority of cases of severe sepsis (between 14 % and 36.9 % according to the capture algorithm used) were assigned explicit discharge codes. In another retrospective study by Whittaker et al. conducted at a single tertiary hospital [13], it was observed that among 1735 cases of severe sepsis between 2005 and 2009, only 21.5 % of cases had explicit severe sepsis/septic shock discharge codes (995.92, 785.52) and that this trend remained stable over the period examined. In contrast, in our study over 60 % of cases in the national health network were documented with such codes. In addition, the implementation of these explicit codes, especially code 995.92, had an increasing trend  over the six-year study period. Although the reasons for this difference are unknown, given Spain's universal health system, we can assume that coding practices will not be related to financial incentives. With regards to other factors, it is likely that the efforts to improve coding strategies for hospital discharge registries and the education programmes and campaigns carried out in our country in recent years, such as the Edusepsis campaign [32], will have improved the awareness and training of healthcare professionals in the identification and diagnosis of severe sepsis, leading to the observed increased use of explicit codes. In line with prior findings [15,16,22], we detected a marked increase in the incidence of severe sepsis though, notably, this increase was mainly accounted for by the cohort of explicit codes, and especially, of code 995.92. Even if the interpretation of this increase may be confounded in part by factors such as better diagnosis of sepsis, improvement in coding practices or other methodological issues [33], the specificity of these codes [9] suggests that the rising trend in the incidence of severe sepsis in adults in our country may be real and not the consequence of excessive coding of infection and organ dysfunction [23,34].
In addition, although the populations included in both our cohorts should have been similar as the codes assigned define the same disease, the patients in each cohort had different outcomes and were only wellmatched in terms of age and sex.
In agreement with another report [13], our data indicate that it was the cases of greater severity that were assigned explicit codes, and that these codes defined a cohort of patients who, despite having fewer comorbidities, had a higher rate of infection of pulmonary and abdominal origin, more Gram-negative pathogens, and above all, more affected organs and a higher mortality rate. In-patient mortality for the cohort captured using explicit codes was 54 % and practically doubled the rate recorded for the cohort identified through combination codes. Notably, mortality in this latter cohort was similar to the rates reported in other population studies, such as those of Angus [15] and Martin [16], in which combination strategies were exclusively used to identify cases, while our rate for the explicit code cohort was consistent with the mortality rates cited for intensive care units [35,36]. Furthermore, in this cohort we recorded practically identical mortality rates among cases coded 995.92 or 785.52. While there are appreciable differences in the demographics, comorbidities and potential infection sources in these cases, both groups featured similar multiple organ dysfunction and this factor likely accounts for the high mortality observed in both subcohorts [37].
Recent data indicate there is great variability in mortality due to severe sepsis and septic shock which, among other factors, seems to be related to the different definitions used in each study [38]. In addition, studies assessing the use of codes 785.52 (septic shock) and 995.92 (severe sepsis) have been scarce. On reanalysis, Gaiesky [9] observed that hospital mortality among adults with explicit codes was 36.9 % for those coded 995.92 and 42.2 % for those coded 785.52. In 373 patients coded as having severe sepsis/septic shock (995.92, 785.52), Whittaker [13] observed 28-day mortality of 41 %. However, no study has compared the epidemiological characteristics and progression among cases coded 995.92 and 785.52. We feel the novelty of our work is that this type of comparison was performed using a national population database including a large number of cases and a representative case-mix. Very probably, because of the new sepsis definitions [39] in which the use of explicit codes is recommended, new studies with which to compare our findings will soon emerge. The high percentage of cases assigned explicit codes and their high mortality rate, in large measure explains the differences observed in in-hospital mortality between the present and other population studies including a lower percentage of cases captured by explicit severe sepsis codes. Further, although mortality in our study had an overall declining trend from 2006 to 2011, this decline was significantly lower in the explicit code cohort. Thus, although our data reflect the decreasing trend in hospital mortality due to severe sepsis in adult patients observed in other studies [9,16,40,41], it remains clear that this trend is more limited in cases of greater disease severity. Interestingly, in both our cohorts a stable trend in the extent of organ dysfunction was produced, which despite not confirming recently published data from the USA [34], could nevertheless explain, at least partly, the observed downward trend in mortality.
The elevated proportion of hospitalizations assigned explicit codes observed in our study is perfectly in line with the new definitions of sepsis and recommendations for the use of ICD-9 codes 995.92 and 785.52 [39] for such cases. In effect, although the implementation of these explicit codes is not yet complete in Spain, it is still high and shows an increasing trend. Consistent with these recommendations [39], we predict that better understanding of the concept and definition of severe sepsis through continued education programmes will improve its description in clinical records and thus allow for a more consistent measure of the burden of severe sepsis and its trends.
The limitations of our study are those inherent to investigations based on retrospectively collected clinicaladministrative data. Although there are national directives for the use of the ICD-9-CM coding system, this may not have been uniform across all hospitals of the national health network and we cannot rule out coding errors despite regular audits making major errors unlikely. We are also aware that, because of its confidential nature, the database used lacks sufficient information and the data do not allow for causal inferences. However, the use of such databases is well-established in severe sepsis epidemiology and the results of a recent meta-analysis clearly support the use of administrative data to monitor mortality trends in severe sepsis [41], confirming the essential role of the consistent use of national administrative data for epidemiological monitoring of incidence and outcomes [9,10].
Our country has a large national population-based database covering over 90 % of all hospitalizations produced annually in the country. There are potential benefits of this system including representativeness, identification of systemic problems, and precision of estimation in statistical analysis. Additionally, this study followed the publication guidelines for observational studies laid down in the strengthening the reporting of observational studies in epidemiology (STROBE) initiative [42].
A limitation of our study was that non-hospitalized patients with severe sepsis were not included, meaning that incidence was really an estimate of treated sepsis [43]. Similarly, our mortality estimates were conservative in that we did not include mortality after hospital discharge [3,5,44].

Conclusions
Our study reveals an elevated and increasing use of explicit coding practices for adult severe sepsis in Spain. This trend will have substantial impacts on epidemiological and disease burden estimates, because cases are registered of greater severity, care intensity and in-hospital mortality. The variation detected in severe sepsis coding and its effects on population estimates calls for improved continuous education for physicians and the introduction of standardized measures targeted at reducing heterogeneity in coding practices.

Key messages
In Spain over the period 2006-2011, some 62 % of adult severe sepsis cases in this population-based study were assigned explicit ICD-9-CM codes. This elevated and increasing use of explicit codes for adult severe sepsis has substantial impacts on epidemiological estimates, because these codes capture a case profile featuring extensive organ dysfunction, care effort and in-hospital mortality. Variability in severe sepsis coding practices must be taken into account when interpreting epidemiological estimates. Inconsistencies in medical coding practices call for the implementation of sepsis education programmes for health professionals.

Additional files
Additional file 1: Table S1. ICD-9-CM codes used to identify hospitalizations for severe sepsis in the combination codes cohort.