Validity of administrative data in recording sepsis: a systematic review

Introduction Administrative health data have been used to study sepsis in large population-based studies. The validity of these study findings depends largely on the quality of the administrative data source and the validity of the case definition used. We systematically reviewed the literature to assess the validity of case definitions of sepsis used with administrative data. Methods Embase and MEDLINE were searched for published articles with International Classification of Diseases (ICD) coded data used to define sepsis. Abstracts and full-text articles were reviewed in duplicate. Data were abstracted from all eligible full-text articles, including ICD-9- and/or ICD-10-based case definitions, sensitivity (Sn), specificity (Sp), positive predictive value (PPV) and negative predictive value (NPV). Results Of 2,317 individual studies identified, 12 full-text articles met all eligibility criteria. A total of 38 sepsis case definitions were tested, which included over 130 different ICD codes. The most common ICD-9 codes were 038.x, 790.7 and 995.92, and the most common ICD-10 codes were A40.x and A41.x. The PPV was reported in ten studies and ranged from 5.6% to 100%, with a median of 50%. Other tests of diagnostic accuracy were reported only in some studies. Sn ranged from 5.9% to 82.3%; Sp ranged from 78.3% to 100%; and NPV ranged from 62.1% to 99.7%. Conclusions The validity of administrative data in recording sepsis varied substantially across individual studies and ICD definitions. Our work may serve as a reference point for consensus towards an improved and harmonized ICD-coded definition of sepsis. Electronic supplementary material The online version of this article (doi:10.1186/s13054-015-0847-3) contains supplementary material, which is available to authorized users.


Introduction
Sepsis is a life-threatening condition associated with a high mortality rate, significant health care costs and longterm consequences [1][2][3]. It is characterized by a spectrum of severity from mild acute organ dysfunction to multiorgan failure with complex pathophysiologic processes. Differentiating sepsis as a cause of multiple organ dysfunction syndrome from other acute systemic inflammatory conditions can be difficult [4].
Many large-scale studies have relied on administrative data to identify patients with sepsis [1,2]. Examples of administrative data include hospital discharge data, emergency visit data, physician claims and hospital insurance claims data. These data are advantageous, as they are readily available and reasonably inexpensive and can include a large cohort of patients, control for some confounders such as chronic disease [5] and include individual outcomes [6]. Many times, these data code diseases using the World Health Organization International Classification of Diseases (ICD) codes [7]. The most recent version of the ICD manual in use is the tenth revision, or ICD-10. This manual exists alongside country modifications such as ICD-10-CA (the Canadian edition) and ICD-10-AM (the Australian Modification). As well, a modification of the ICD-9 version (ICD-9-CM) is still being used in a number of countries, such as the United States and Italy [8].
Prior to 1992, there was a lack of consensus regarding clinical criteria and definitions for sepsis and related conditions. The Centers for Disease Control and Prevention (CDC) reported sepsis admissions using administrative data in which the term septicemia, referring to the presence and spread of microorganisms via circulating blood [9], was used as a clinical case definition and did not fully incorporate the spectrum of illness that was later defined in more detail by the 1992 American College of Chest Physicians and Society of Critical Care Medicine (ACCP/SCCM) Consensus Conference clinical definitions [10].
Angus et al. [1] performed a large-scale, multi-centre epidemiological study in which they implemented the identification of patients with severe sepsis using an ICD-9-based algorithm that required evidence of both an infection and new-onset organ dysfunction during a single hospitalization, thereafter described as the Angus implementation coding scheme. The Angus implementation is one of the most well-known and highly cited implementations of an ICD-coded case definition for sepsis. This definition was originally validated by the authors through a comparison of aggregate data showing hospital incidence rates and patient characteristics of the cohorts captured through the ICD-9-CM algorithm versus a previous cohort captured through a prospective study of patients with sepsis by Sands et al. [11]. A recent study [12] validated the Angus implementation and another well-known algorithm known as the Martin implementation [2] using a reference standard based on physician-based medical chart review. The Angus implementation was reported as having a moderate to low sensitivity (Sn) of 50.3% and a positive predictive value (PPV) of 70.7%, whereas the Martin implementation had a very low Sn of 16.8% but a high PPV of 97.6%. As such, they concluded that a population of patients with severe sepsis could be captured through administrative data using the Angus case definition, but that cases would be underestimated. Studies that examined the performance of ICD coding algorithms to identify other conditions have also highlighted the great variability that exists when multiple codes are used to define a specific condition [13,14].
The accurate identification of cases of sepsis using ICD-coded administrative data for use in health services research is paramount especially if examining complex diseases such as sepsis, where burden of disease and costs of care are very high. There is currently no consensus regarding which ICD-9 or ICD-10 codes should be used to define sepsis in administrative data. A reasonable step towards the harmonization of an ICD-based definition for sepsis is to examine the literature and report the validity of published ICD-coded case definitions in administrative data.

Search strategy
We applied a modification of the search strategy methodology of St Germaine-Smith et al. [14]. Using the Ovid interface, we conducted searches in MEDLINE and Embase for publications published between 1992 (based on the 1992 publication date of the establishment of definition criteria for sepsis/severe sepsis by ACCP/SCCM) and 15 September 2014, applying 'humans' and 'English language' filters. In order to identify studies assessing the diagnostic accuracy of ICD codes for identifying sepsis, the Boolean operator ' AND' was used to combine three search concepts: sepsis, coding and validity. Articles concerning sepsis were sought using the Boolean operator 'OR' to combine the Medical Subject Headings (MeSH) term 'sepsis' and Emtree terms relevant to the condition of sepsis, including 'severe sepsis' and 'septic shock'. Articles concerning the concept of coding were sought using the Boolean operator 'OR' to combine the MeSH terms and keyword searches for the following terms: 'administrative data', 'hospital discharge data', 'ICD-9', 'ICD-10', 'ICD-9xM' or 'ICD-10xM' (country versions), 'medical record', 'health information', 'surveillance', 'physician claims', 'claims', 'hospital discharge', 'coding' and 'codes'. Articles concerning validity were sought using Boolean operator 'OR' to combine the MeSH and keyword searches for the terms 'validity', 'validation', 'case definition', 'algorithm', 'agreement', 'accuracy', 'sensitivity', 'specificity', 'positive predictive value' and 'negative predictive value' (Additional file 1).

Study inclusion
To be eligible for inclusion, articles had to compare the accuracy of ICD-9 or ICD-10 codes for sepsis, severe sepsis or septic shock in an administrative database to a reference standard and report at least one of Sn, specificity (Sp), PPV or negative predictive value (NPV). For comparison purposes, studies identified in the search that validated an ICD-coded definition without reporting any diagnostic accuracy measures were excluded. The following diagnostic accuracy measures were abstracted, if provided, from each study: Sn, Sp, PPV and NPV. All bibliographical references were imported into a customwritten Java software application [15] for improved reference management and data collection. This software, called Synthesis, is described in more detail elsewhere [16]. The title and abstract of each citation identified were screened in duplicate for eligibility by two reviewers (RJJ and KJS). Any article selected as meeting eligibility criteria by either or both reviewers was then retrieved and reviewed by the same two authors for eligibility criteria. Articles excluded based on title and abstract with reasons for exclusion are given in Additional file 2. To determine inter-rater agreement, the Cohen's κ statistic was calculated at both the title and abstract review stage and in the full-text article review stage. All articles for which there was inter-rater discord at the abstract review stage went on to full-text review. Any full-text articles for which there was inter-rater discord were reviewed a second time, and further disagreements about study eligibility at the full-text review stage were resolved through discussion until full consensus was obtained.

Data extraction and quality assessment
One author (RJJ) abstracted data from included studies using the standardized abstraction form, including country location of study, years of data collection, validation database, sample size and type of sample population. The validated ICD codes and algorithms, diagnostic field position and ICD version used from each study were recorded along with Sn, Sp, NPV and PPV. The authors calculated Sn or Sp in cases where these values were not reported but raw data were available to calculate them.
The included studies were assessed for quality by two reviewers, (KJS and RJJ), using a standardized validation study quality checklist adapted from Benchimol et al. [17]. In instances where it was unclear whether a checklist item was fulfilled by the study, it was marked as uncertain. Any discrepancies between the two reviewers were resolved through discussion. Studies included were published in peer-reviewed journals; therefore, it was not necessary to obtain patient consent. This study was reviewed and approved by the Conjoint Health Research Ethics Board at the University of Calgary.

Study characteristics
Of 2,317 abstracts reviewed, 96 fulfilled eligibility criteria for full-text review. Amongst these articles, the κ score for inter-rater agreement was 0.87, resulting in near-perfect agreement [18]. Twelve articles met all eligibility criteria and were included in the study [12,[19][20][21][22][23][24][25][26][27][28][29] ( Figure 1). The characteristics of the studies are shown in Table 1. All 12 studies examined hospital discharge abstract data (also called 'inpatient administrative health data' or 'inpatient claims administrative dataset'). Eight of the twelve studies were performed in the United States [12,19,21,23,25,[27][28][29], one in Australia [22], one in Denmark [24], one in Sweden [20] and one in Canada [26]. Publication dates ranged from 1998 to 2014. Seven  studies examined ICD-9-CM codes, one examined only ICD-9, one examined both ICD-9 and ICD-10 codes, one study examined ICD-10, one study examined the ICD-10 Danish version and one study examined ICD-10-AM (Australian Modification) codes. The studies varied considerably in sample size (ranging from 34 to 4,181) and had heterogeneity in patients studied, including highly selective populations (rheumatoid arthritis) or sepsis clinical trial patients, to intensive care unit (ICU)-specific, general medical patients or surgical patients. The clinical definition of sepsis varied across studies but generally followed the ACCP/SCCM consensus conference definition's clinical criteria closely [30].

Performance characteristics
Reference standard definitions included medical chart review, ICU registry database (both validated and not validated by ICU physicians), bacteraemia-specific registry database, surgical inpatient database and a cohort of patients who had been entered into severe sepsis clinical trials based on specified and defined inclusion criteria. A total of 38 ICD sepsis case definitions were tested with over 130 different ICD codes (see Table 2 for codes used in each study). The most commonly used codes were the ICD-9 codes 038.x (septicaemia, not otherwise specified (NOS)), 790.7 (bacteraemia, NOS) and 995.92 (severe sepsis) and the ICD-10 codes A40.x (streptococcal sepsis) and A41.x (other sepsis). The validity of the ICD sepsis definitions varied greatly among studies. Seven of the twelve studies calculated Sn, and five studies calculated Sp. Sn ranged from 5.9% to 82.3% (median: 42.4%), and Sp ranged from 78.3% to 100% (median: 98.5%). The PPV was calculated in 10 of the 12 studies and ranged from 5.6% to 100% (median: 50%); NPV was provided in four studies and ranged from 62.1% to 99.7% (median: 97.4%) ( Table 1).
After applying the standardized quality assessment checklist to each of the 12 included studies, the tallied scores ranged from 10 to 30, indicating variable quality among the studies (Table 3).

Discussion
In this review, we identified and summarized the published literature evaluating and validating ICD-9 and ICD-10 codes used to identify sepsis in administrative databases. We identified 12 studies that met all eligibility criteria for this systematic review and found large variations in terms of the scope of ICD codes used and the estimates of validity among studies. All studies validated inpatient data, and the majority of the studies showed that ICD codes defining a diagnosis of sepsis in administrative data are highly specific but lack Sn. In 10 of the 12 studies, Sn was low (<53%), even in cases of altering study characteristics [20]. A reasonable conclusion is that sepsis is largely undercoded in administrative data using ICD-9 or ICD-10 coded case definitions, regardless of study characteristics. However, the high Sp and NPV do mean that few falsepositives would be present in such a dataset.
The heterogeneity seen among the studies in coding accuracy, especially with respect to Sn and PPV, may be due to multiple factors, including the number of codes used, the version of ICD used, the sample population, the reference standard comparison used and the type of administrative data. For instance Gedeborg et al. [20] applied the same ICD-9 and ICD-10 coding algorithms to different patient populations, including ICU patients with community-acquired sepsis and infectious disease department patients, and tested these against two different reference standard definitions (sepsis clinical trial patients and patients from an ICU-specific coded database). They showed the data accuracy to have large variations that were dependent on the patient population being studied and reference standard used. Not surprisingly, limiting the sample population to one in which an infectious disease service was consulted during the patient stay actually decreased the Sn by 28.5% while only increasing the Sp by 1.9%. It has also been reported that severe sepsis is poorly documented outside the ICU, although in one study sepsis was commonly found on non-ICU medical wards [31], suggesting that the accuracy of diagnostic codes may be substantially impacted, depending on the population selected or the criteria used to define the population.
Validity is also dependent on diagnostic coding field location (primary or secondary or all). Cevasco et al. [19] examined a population-based inpatient database but restricted the sepsis diagnostic code to a secondary coding field position in two separate populations, resulting in lower PPV values (43% for Veterans Affairs patients and 51% for community hospital patients). Grijalva et al. [21] restricted the population to a highly specific patient sample (rheumatoid arthritis patients) and examined only five ICD-9-CM codes; however, they allowed the coding field position to be either primary or secondary, which resulted in a PPV of 80%. Gedeborg et al. [20] performed multiple comparisons using primary or both primary and secondary code field positions. They reported consistently high Sn estimates when both the primary and secondary coding field positions were included. The primary coding field is normally designated for the condition that contributed the most to a patient's length of stay or was the main reason for admission (depending on country). Thus, sicker patients presenting with severe sepsis or septic shock are more likely to be captured using the primary diagnosis alone. A further limitation of severity level coding is reflected in the organ dysfunction codes used to identify severe sepsis, as these diagnostic codes would most likely be recorded in the secondary code field positions. In none of the studies were any particular organ dysfunction codes validated or the coding field positions examined.
The variation in diagnosing sepsis alone translates to variable recording of the diagnosis in the medical record. O'Malley et al. [32] described the patient trajectory from admission to discharge and the process of recording the       admitting diagnosis to the assignment of an ICD code post-discharge. A suggested error when a physician records a diagnosis in the medical record is based on the variance across terms and language used to describe the disease and/or reporting of an infection without concomitant reporting of systemic inflammation or associated organ dysfunction. Peoze et al. [33] examined how a physician's awareness and attitude towards the diagnosis of sepsis impacted the recording of sepsis. They reported that 46% of the time in the case of sepsis, the cause of death was incorrectly recorded as due to another disease. Assunção et al. [34] found that sepsis was most frequently misdiagnosed, up to 66.5% of the time, as infection without clinical and laboratory signs of inflammatory response. Therefore, low case capture of sepsis may also be due to the capacity of practicing physicians to recognize and report clinical cases of systemic inflammatory response syndrome, sepsis, severe sepsis and septic shock in the medical record. No study examined the expertise of the coders or the impact of physician documentation on the selected codes.
The results of this systematic review should raise a question about whether reliable research on sepsis can be performed using administrative data. On the basis of the findings of our review, hospital discharge abstract data alone are an insufficient source for researchers to examine sepsis incidence accurately or for surveillance. However, administrative data and ICD coding algorithms could still be used to examine risk factors for the development of sepsis or outcomes. In these studies, a high Sp with a reduced Sn may suffice to minimize the number of false-positive cases, with a caveat being a limitation that these studies may include a subset of more easily defined and/or recognized cases or a more severe form of sepsis.
The complexity which makes up the clinical entity of sepsis has led to a significant effort over the past 20-plus years to standardize clinical and laboratory diagnostic criteria and definitions [10,30,35,36]. Although designed primarily for clinical use, these definitions have led to practical applications for other research, including health care quality and utilization improvement initiatives and surveillance. Particularly for surveillance, one of the purposes is to monitor disease prevalence over a span of years and forecast future trends. Thus, the trend is related to the stability of the data validity in the observation period, regardless of the level of validity. That said, administrative data are still an invaluable resource to monitor sepsis, although it does not capture the same amount of clinical detail that an Electronic Medical Record (EMR) does. Other advantages, such as wide geographical coverage, a population-based capture of nearly every contact with the health care system and the overall costeffectiveness [6], make administrative data a lucrative source of health information. Administrative data cannot replicate the complex myriad of the clinical criteria comprising sepsis; therefore, translating this clinical definition into coded data and evaluating the validity of the coding of sepsis in administrative data are crucial.
Although a desired definition with Sn and Sp of 100% would be ideal, modifying and optimizing the data definition to capture sepsis as accurately as possible, with Sn falling above 75%, similar to that of other hospital-acquired infections internationally [37] and for non-communicable diseases such as hypertension [38] and diabetes [39], should be the ultimate goal. Improving the quality of administrative health data and increasing the case capture and validity of sepsis could be accomplished through a number of simple strategies, such as (1) improved physician documentation, including documenting sepsis in the front pages of the chart to get the attention of coders; (2) having a specialized coding procedure for ICU patients, perhaps including specific training of health care coders to improve familiarity with the case mix of patients and conditions that are more prevalent in the ICU to increase Sn and case capture; and (3) for those countries in which a limited number of diagnostic coding fields exist, there should be at least eight coding fields for diagnosis to capture conditions such as sepsis [40]. These strategies can be used in combination with data linkage to other data sources such as laboratory, pharmacy or microbiology data and the EMRs, and with clinical factors such as heart rate, respiratory rate, body temperature, white blood cell count and markers of organ dysfunction, to try to incorporate the key characteristics of sepsis defined and listed in the ACCP/ SCCM definitions [30]. Both improving the definition of sepsis and making it comparable across national and international jurisdictions is of the utmost importance to continue improving the understanding of how quality of sepsis care is impacting the incidence and outcomes of the disease.
There are limitations to this systematic review. The search strategy was limited to only studies published in English, and a grey literature search was not conducted. The target of the study was ICD codes used for sepsis specifically. Because sepsis itself is difficult to diagnose and has a range of clinical presentations, there is a possibility that validation studies examining only these other conditions and not sepsis specifically may have been missed. Publication bias in validation studies may also be a concern, as authors may report only better-performing case definitions and may not publish less well-performing case definitions with very low diagnostic accuracy. However, our systematic review included studies with very low values for case definitions, and therefore there is little concern that publication bias has occurred.