Skip to main content

Validity of administrative data in recording sepsis: a systematic review



Administrative health data have been used to study sepsis in large population-based studies. The validity of these study findings depends largely on the quality of the administrative data source and the validity of the case definition used. We systematically reviewed the literature to assess the validity of case definitions of sepsis used with administrative data.


Embase and MEDLINE were searched for published articles with International Classification of Diseases (ICD) coded data used to define sepsis. Abstracts and full-text articles were reviewed in duplicate. Data were abstracted from all eligible full-text articles, including ICD-9- and/or ICD-10-based case definitions, sensitivity (Sn), specificity (Sp), positive predictive value (PPV) and negative predictive value (NPV).


Of 2,317 individual studies identified, 12 full-text articles met all eligibility criteria. A total of 38 sepsis case definitions were tested, which included over 130 different ICD codes. The most common ICD-9 codes were 038.x, 790.7 and 995.92, and the most common ICD-10 codes were A40.x and A41.x. The PPV was reported in ten studies and ranged from 5.6% to 100%, with a median of 50%. Other tests of diagnostic accuracy were reported only in some studies. Sn ranged from 5.9% to 82.3%; Sp ranged from 78.3% to 100%; and NPV ranged from 62.1% to 99.7%.


The validity of administrative data in recording sepsis varied substantially across individual studies and ICD definitions. Our work may serve as a reference point for consensus towards an improved and harmonized ICD-coded definition of sepsis.


Sepsis is a life-threatening condition associated with a high mortality rate, significant health care costs and long-term consequences [1-3]. It is characterized by a spectrum of severity from mild acute organ dysfunction to multi-organ failure with complex pathophysiologic processes. Differentiating sepsis as a cause of multiple organ dysfunction syndrome from other acute systemic inflammatory conditions can be difficult [4].

Many large-scale studies have relied on administrative data to identify patients with sepsis [1,2]. Examples of administrative data include hospital discharge data, emergency visit data, physician claims and hospital insurance claims data. These data are advantageous, as they are readily available and reasonably inexpensive and can include a large cohort of patients, control for some confounders such as chronic disease [5] and include individual outcomes [6]. Many times, these data code diseases using the World Health Organization International Classification of Diseases (ICD) codes [7]. The most recent version of the ICD manual in use is the tenth revision, or ICD-10. This manual exists alongside country modifications such as ICD-10-CA (the Canadian edition) and ICD-10-AM (the Australian Modification). As well, a modification of the ICD-9 version (ICD-9-CM) is still being used in a number of countries, such as the United States and Italy [8].

Prior to 1992, there was a lack of consensus regarding clinical criteria and definitions for sepsis and related conditions. The Centers for Disease Control and Prevention (CDC) reported sepsis admissions using administrative data in which the term septicemia, referring to the presence and spread of microorganisms via circulating blood [9], was used as a clinical case definition and did not fully incorporate the spectrum of illness that was later defined in more detail by the 1992 American College of Chest Physicians and Society of Critical Care Medicine (ACCP/SCCM) Consensus Conference clinical definitions [10].

Angus et al. [1] performed a large-scale, multi-centre epidemiological study in which they implemented the identification of patients with severe sepsis using an ICD-9-based algorithm that required evidence of both an infection and new-onset organ dysfunction during a single hospitalization, thereafter described as the Angus implementation coding scheme. The Angus implementation is one of the most well-known and highly cited implementations of an ICD-coded case definition for sepsis. This definition was originally validated by the authors through a comparison of aggregate data showing hospital incidence rates and patient characteristics of the cohorts captured through the ICD-9-CM algorithm versus a previous cohort captured through a prospective study of patients with sepsis by Sands et al. [11]. A recent study [12] validated the Angus implementation and another well-known algorithm known as the Martin implementation [2] using a reference standard based on physician-based medical chart review. The Angus implementation was reported as having a moderate to low sensitivity (Sn) of 50.3% and a positive predictive value (PPV) of 70.7%, whereas the Martin implementation had a very low Sn of 16.8% but a high PPV of 97.6%. As such, they concluded that a population of patients with severe sepsis could be captured through administrative data using the Angus case definition, but that cases would be underestimated. Studies that examined the performance of ICD coding algorithms to identify other conditions have also highlighted the great variability that exists when multiple codes are used to define a specific condition [13,14].

The accurate identification of cases of sepsis using ICD-coded administrative data for use in health services research is paramount especially if examining complex diseases such as sepsis, where burden of disease and costs of care are very high. There is currently no consensus regarding which ICD-9 or ICD-10 codes should be used to define sepsis in administrative data. A reasonable step towards the harmonization of an ICD-based definition for sepsis is to examine the literature and report the validity of published ICD-coded case definitions in administrative data.

Material and methods

Search strategy

We applied a modification of the search strategy methodology of St Germaine-Smith et al. [14]. Using the Ovid interface, we conducted searches in MEDLINE and Embase for publications published between 1992 (based on the 1992 publication date of the establishment of definition criteria for sepsis/severe sepsis by ACCP/SCCM) and 15 September 2014, applying ‘humans’ and ‘English language’ filters. In order to identify studies assessing the diagnostic accuracy of ICD codes for identifying sepsis, the Boolean operator ‘AND’ was used to combine three search concepts: sepsis, coding and validity. Articles concerning sepsis were sought using the Boolean operator ‘OR’ to combine the Medical Subject Headings (MeSH) term ‘sepsis’ and Emtree terms relevant to the condition of sepsis, including ‘severe sepsis’ and ‘septic shock’. Articles concerning the concept of coding were sought using the Boolean operator ‘OR’ to combine the MeSH terms and keyword searches for the following terms: ‘administrative data’, ‘hospital discharge data’, ‘ICD-9’, ‘ICD-10’, ‘ICD-9xM’ or ‘ICD-10xM’ (country versions), ‘medical record’, ‘health information’, ‘surveillance’, ‘physician claims’, ‘claims’, ‘hospital discharge’, ‘coding’ and ‘codes’. Articles concerning validity were sought using Boolean operator ‘OR’ to combine the MeSH and keyword searches for the terms ‘validity’, ‘validation’, ‘case definition’, ‘algorithm’, ‘agreement’, ‘accuracy’, ‘sensitivity’, ‘specificity’, ‘positive predictive value’ and ‘negative predictive value’ (Additional file 1).

Study inclusion

To be eligible for inclusion, articles had to compare the accuracy of ICD-9 or ICD-10 codes for sepsis, severe sepsis or septic shock in an administrative database to a reference standard and report at least one of Sn, specificity (Sp), PPV or negative predictive value (NPV). For comparison purposes, studies identified in the search that validated an ICD-coded definition without reporting any diagnostic accuracy measures were excluded. The following diagnostic accuracy measures were abstracted, if provided, from each study: Sn, Sp, PPV and NPV. All bibliographical references were imported into a custom-written Java software application [15] for improved reference management and data collection. This software, called Synthesis, is described in more detail elsewhere [16]. The title and abstract of each citation identified were screened in duplicate for eligibility by two reviewers (RJJ and KJS). Any article selected as meeting eligibility criteria by either or both reviewers was then retrieved and reviewed by the same two authors for eligibility criteria. Articles excluded based on title and abstract with reasons for exclusion are given in Additional file 2. To determine inter-rater agreement, the Cohen’s κ statistic was calculated at both the title and abstract review stage and in the full-text article review stage. All articles for which there was inter-rater discord at the abstract review stage went on to full-text review. Any full-text articles for which there was inter-rater discord were reviewed a second time, and further disagreements about study eligibility at the full-text review stage were resolved through discussion until full consensus was obtained.

Data extraction and quality assessment

One author (RJJ) abstracted data from included studies using the standardized abstraction form, including country location of study, years of data collection, validation database, sample size and type of sample population. The validated ICD codes and algorithms, diagnostic field position and ICD version used from each study were recorded along with Sn, Sp, NPV and PPV. The authors calculated Sn or Sp in cases where these values were not reported but raw data were available to calculate them.

The included studies were assessed for quality by two reviewers, (KJS and RJJ), using a standardized validation study quality checklist adapted from Benchimol et al. [17]. In instances where it was unclear whether a checklist item was fulfilled by the study, it was marked as uncertain. Any discrepancies between the two reviewers were resolved through discussion. Studies included were published in peer-reviewed journals; therefore, it was not necessary to obtain patient consent. This study was reviewed and approved by the Conjoint Health Research Ethics Board at the University of Calgary.


Study characteristics

Of 2,317 abstracts reviewed, 96 fulfilled eligibility criteria for full-text review. Amongst these articles, the κ score for inter-rater agreement was 0.87, resulting in near-perfect agreement [18]. Twelve articles met all eligibility criteria and were included in the study [12,19-29] (Figure 1). The characteristics of the studies are shown in Table 1. All 12 studies examined hospital discharge abstract data (also called ‘inpatient administrative health data’ or ‘inpatient claims administrative dataset’). Eight of the twelve studies were performed in the United States [12,19,21,23,25,27-29], one in Australia [22], one in Denmark [24], one in Sweden [20] and one in Canada [26]. Publication dates ranged from 1998 to 2014. Seven studies examined ICD-9-CM codes, one examined only ICD-9, one examined both ICD-9 and ICD-10 codes, one study examined ICD-10, one study examined the ICD-10 Danish version and one study examined ICD-10-AM (Australian Modification) codes. The studies varied considerably in sample size (ranging from 34 to 4,181) and had heterogeneity in patients studied, including highly selective populations (rheumatoid arthritis) or sepsis clinical trial patients, to intensive care unit (ICU)-specific, general medical patients or surgical patients. The clinical definition of sepsis varied across studies but generally followed the ACCP/SCCM consensus conference definition’s clinical criteria closely [30].

Figure 1
figure 1

Flow diagram for study screening and article inclusion. ICD, International Classification of Diseases.

Table 1 Characteristics of studies included and summary of measures reported in validation studiesa

Performance characteristics

Reference standard definitions included medical chart review, ICU registry database (both validated and not validated by ICU physicians), bacteraemia-specific registry database, surgical inpatient database and a cohort of patients who had been entered into severe sepsis clinical trials based on specified and defined inclusion criteria. A total of 38 ICD sepsis case definitions were tested with over 130 different ICD codes (see Table 2 for codes used in each study). The most commonly used codes were the ICD-9 codes 038.x (septicaemia, not otherwise specified (NOS)), 790.7 (bacteraemia, NOS) and 995.92 (severe sepsis) and the ICD-10 codes A40.x (streptococcal sepsis) and A41.x (other sepsis).

Table 2 ICD version and ICD codes used in included studiesa

The validity of the ICD sepsis definitions varied greatly among studies. Seven of the twelve studies calculated Sn, and five studies calculated Sp. Sn ranged from 5.9% to 82.3% (median: 42.4%), and Sp ranged from 78.3% to 100% (median: 98.5%). The PPV was calculated in 10 of the 12 studies and ranged from 5.6% to 100% (median: 50%); NPV was provided in four studies and ranged from 62.1% to 99.7% (median: 97.4%) (Table 1).

One study [20] examined eighteen different case definitions using a ‘sepsis wide’ coded definition and a ‘sepsis narrow’ coded definition for both ICD-9 and ICD-10 codes. These coding algorithms were then compared. Among these case definitions, Sn varied from 17.2% to 52.5% (median: 37.0%) and Sp ranged from 92.6% to 99.8% (median: 98.5%) (Table 1).

After applying the standardized quality assessment checklist to each of the 12 included studies, the tallied scores ranged from 10 to 30, indicating variable quality among the studies (Table 3).

Table 3 Quality assessment checklist of reporting criteria for validation studies of health administrative dataa


In this review, we identified and summarized the published literature evaluating and validating ICD-9 and ICD-10 codes used to identify sepsis in administrative databases. We identified 12 studies that met all eligibility criteria for this systematic review and found large variations in terms of the scope of ICD codes used and the estimates of validity among studies. All studies validated inpatient data, and the majority of the studies showed that ICD codes defining a diagnosis of sepsis in administrative data are highly specific but lack Sn. In 10 of the 12 studies, Sn was low (<53%), even in cases of altering study characteristics [20]. A reasonable conclusion is that sepsis is largely undercoded in administrative data using ICD-9 or ICD-10 coded case definitions, regardless of study characteristics. However, the high Sp and NPV do mean that few false-positives would be present in such a dataset.

The heterogeneity seen among the studies in coding accuracy, especially with respect to Sn and PPV, may be due to multiple factors, including the number of codes used, the version of ICD used, the sample population, the reference standard comparison used and the type of administrative data. For instance Gedeborg et al. [20] applied the same ICD-9 and ICD-10 coding algorithms to different patient populations, including ICU patients with community-acquired sepsis and infectious disease department patients, and tested these against two different reference standard definitions (sepsis clinical trial patients and patients from an ICU-specific coded database). They showed the data accuracy to have large variations that were dependent on the patient population being studied and reference standard used. Not surprisingly, limiting the sample population to one in which an infectious disease service was consulted during the patient stay actually decreased the Sn by 28.5% while only increasing the Sp by 1.9%. It has also been reported that severe sepsis is poorly documented outside the ICU, although in one study sepsis was commonly found on non-ICU medical wards [31], suggesting that the accuracy of diagnostic codes may be substantially impacted, depending on the population selected or the criteria used to define the population.

Validity is also dependent on diagnostic coding field location (primary or secondary or all). Cevasco et al. [19] examined a population-based inpatient database but restricted the sepsis diagnostic code to a secondary coding field position in two separate populations, resulting in lower PPV values (43% for Veterans Affairs patients and 51% for community hospital patients). Grijalva et al. [21] restricted the population to a highly specific patient sample (rheumatoid arthritis patients) and examined only five ICD-9-CM codes; however, they allowed the coding field position to be either primary or secondary, which resulted in a PPV of 80%. Gedeborg et al. [20] performed multiple comparisons using primary or both primary and secondary code field positions. They reported consistently high Sn estimates when both the primary and secondary coding field positions were included. The primary coding field is normally designated for the condition that contributed the most to a patient’s length of stay or was the main reason for admission (depending on country). Thus, sicker patients presenting with severe sepsis or septic shock are more likely to be captured using the primary diagnosis alone. A further limitation of severity level coding is reflected in the organ dysfunction codes used to identify severe sepsis, as these diagnostic codes would most likely be recorded in the secondary code field positions. In none of the studies were any particular organ dysfunction codes validated or the coding field positions examined.

The variation in diagnosing sepsis alone translates to variable recording of the diagnosis in the medical record. O’Malley et al. [32] described the patient trajectory from admission to discharge and the process of recording the admitting diagnosis to the assignment of an ICD code post-discharge. A suggested error when a physician records a diagnosis in the medical record is based on the variance across terms and language used to describe the disease and/or reporting of an infection without concomitant reporting of systemic inflammation or associated organ dysfunction. Peoze et al. [33] examined how a physician’s awareness and attitude towards the diagnosis of sepsis impacted the recording of sepsis. They reported that 46% of the time in the case of sepsis, the cause of death was incorrectly recorded as due to another disease. Assunção et al. [34] found that sepsis was most frequently misdiagnosed, up to 66.5% of the time, as infection without clinical and laboratory signs of inflammatory response. Therefore, low case capture of sepsis may also be due to the capacity of practicing physicians to recognize and report clinical cases of systemic inflammatory response syndrome, sepsis, severe sepsis and septic shock in the medical record. No study examined the expertise of the coders or the impact of physician documentation on the selected codes.

The results of this systematic review should raise a question about whether reliable research on sepsis can be performed using administrative data. On the basis of the findings of our review, hospital discharge abstract data alone are an insufficient source for researchers to examine sepsis incidence accurately or for surveillance. However, administrative data and ICD coding algorithms could still be used to examine risk factors for the development of sepsis or outcomes. In these studies, a high Sp with a reduced Sn may suffice to minimize the number of false-positive cases, with a caveat being a limitation that these studies may include a subset of more easily defined and/or recognized cases or a more severe form of sepsis.

The complexity which makes up the clinical entity of sepsis has led to a significant effort over the past 20-plus years to standardize clinical and laboratory diagnostic criteria and definitions [10,30,35,36]. Although designed primarily for clinical use, these definitions have led to practical applications for other research, including health care quality and utilization improvement initiatives and surveillance. Particularly for surveillance, one of the purposes is to monitor disease prevalence over a span of years and forecast future trends. Thus, the trend is related to the stability of the data validity in the observation period, regardless of the level of validity. That said, administrative data are still an invaluable resource to monitor sepsis, although it does not capture the same amount of clinical detail that an Electronic Medical Record (EMR) does. Other advantages, such as wide geographical coverage, a population-based capture of nearly every contact with the health care system and the overall cost-effectiveness [6], make administrative data a lucrative source of health information. Administrative data cannot replicate the complex myriad of the clinical criteria comprising sepsis; therefore, translating this clinical definition into coded data and evaluating the validity of the coding of sepsis in administrative data are crucial.

Although a desired definition with Sn and Sp of 100% would be ideal, modifying and optimizing the data definition to capture sepsis as accurately as possible, with Sn falling above 75%, similar to that of other hospital-acquired infections internationally [37] and for non-communicable diseases such as hypertension [38] and diabetes [39], should be the ultimate goal. Improving the quality of administrative health data and increasing the case capture and validity of sepsis could be accomplished through a number of simple strategies, such as (1) improved physician documentation, including documenting sepsis in the front pages of the chart to get the attention of coders; (2) having a specialized coding procedure for ICU patients, perhaps including specific training of health care coders to improve familiarity with the case mix of patients and conditions that are more prevalent in the ICU to increase Sn and case capture; and (3) for those countries in which a limited number of diagnostic coding fields exist, there should be at least eight coding fields for diagnosis to capture conditions such as sepsis [40]. These strategies can be used in combination with data linkage to other data sources such as laboratory, pharmacy or microbiology data and the EMRs, and with clinical factors such as heart rate, respiratory rate, body temperature, white blood cell count and markers of organ dysfunction, to try to incorporate the key characteristics of sepsis defined and listed in the ACCP/SCCM definitions [30]. Both improving the definition of sepsis and making it comparable across national and international jurisdictions is of the utmost importance to continue improving the understanding of how quality of sepsis care is impacting the incidence and outcomes of the disease.

There are limitations to this systematic review. The search strategy was limited to only studies published in English, and a grey literature search was not conducted. The target of the study was ICD codes used for sepsis specifically. Because sepsis itself is difficult to diagnose and has a range of clinical presentations, there is a possibility that validation studies examining only these other conditions and not sepsis specifically may have been missed. Publication bias in validation studies may also be a concern, as authors may report only better-performing case definitions and may not publish less well-performing case definitions with very low diagnostic accuracy. However, our systematic review included studies with very low values for case definitions, and therefore there is little concern that publication bias has occurred.


Validated case definitions for sepsis have been reported with varying degrees of accuracy in studies using administrative data. Sepsis remains one of the top causes of death, specifically in the ICU, and as more researchers are utilizing administrative data to study sepsis outcomes and health services associated with care, an accurate ICD coded case definition is needed. Future studies are warranted to optimize the ascertainment of sepsis in administrative data, whether by testing new enhanced definitions, by optimizing physician documentation and/or by considering data linkage..

Key messages

  • Sepsis is undercoded in administrative data using ICD-9- and ICD-10-based case definitions.

  • There is high heterogeneity across studies for coding sepsis in administrative data, which is dependent on the ICD codes used, the population studied, the criteria used to define sepsis and the diagnostic coding position, to name a few.

  • To improve the capture of true sepsis cases in administrative data, strategies should be considered that include data linkage, improving physician documentation, implementing specialized coding procedures for ICU patients and the use of at least eight coding fields for diagnosis to capture complex conditions such as sepsis.



American College of Chest Physicians


American College of Surgeons


Community-acquired sepsis


Department of Infectious Disease


Emergency department


Electronic Medical Record


International Classification of Diseases


International Classification of Diseases, Ninth Revision


International Classification of Diseases, Tenth Revision


International Classification of Diseases, Tenth Revision, Australian Modification


International Classification of Diseases, Tenth Revision, Canadian edition


International Classification of Diseases, Ninth Revision, Clinical Modification


Intensive care unit


Medical Subject Headings


Negative predictive value


National Surgical Quality Improvement Program


Positive predictive value


Society of Critical Care Medicine






  1. 1.

    Angus DC, Linde-Zwirble WT, Lidicker J, Clermont G, Carcillo J, Pinsky MR. Epidemiology of severe sepsis in the United States: analysis of incidence, outcome, and associated costs of care. Crit Care Med. 2001;29:1303–10.

    CAS  Article  Google Scholar 

  2. 2.

    Martin GS, Mannino DM, Eaton S, Moss M. The epidemiology of sepsis in the United States from 1979 through 2000. N Engl J Med. 2003;348:1546–54.

    Article  Google Scholar 

  3. 3.

    Iwashyna TJ, Ely EW, Smith DM, Langa KM. Long-term cognitive impairment and functional disability among survivors of severe sepsis. JAMA. 2010;304:1787–94.

    CAS  Article  Google Scholar 

  4. 4.

    Lynn LA. The diagnosis of sepsis revisited – a challenge for young medical scientists in the 21st century. Patient Saf Surg. 2014;8:1.

    Article  Google Scholar 

  5. 5.

    Quach S, Hennessy DA, Faris P, Fong A, Quan H, Doig C. A comparison between the APACHE II and Charlson Index Score for predicting hospital mortality in critically ill patients. BMC Health Serv Res. 2009;9:129.

    Article  Google Scholar 

  6. 6.

    Zhan C, Miller MR. Administrative data based patient safety research: a critical review. Qual Saf Health Care. 2003;12:ii58–63.

    Article  Google Scholar 

  7. 7.

    World Health Organization. International Classification of Diseases (ICD). Accessed 25 Mar 2015.

  8. 8.

    Jetté N, Quan H, Hemmelgarn B, Drosler S, Maass C, Moskal L, et al. The development, evolution, and modifications of ICD-10: challenges to the international comparability of mortality data. Med Care. 2010;48:1105–10.

    Article  Google Scholar 

  9. 9.

    Odeh M. Sepsis, septicaemia, sepsis syndrome, and septic shock: the correct definition and use. Postgrad Med J. 1996;72:66.

    CAS  Article  Google Scholar 

  10. 10.

    Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians/Society of Critical Care Medicine. Chest. 1992;101:1644–55.

    CAS  Article  Google Scholar 

  11. 11.

    Sands KE, Bates DW, Lanken PN, Graman PS, Hibberd PL, Kahn KL, et al. Epidemiology of sepsis syndrome in 8 academic medical centers. JAMA. 1997;278:234–40.

    CAS  Article  Google Scholar 

  12. 12.

    Iwashyna TJ, Odden A, Rohde J, Bonham C, Kuhn L, Malani P, et al. Identifying patients with severe sepsis using administrative claims: patient-level validation of the angus implementation of the international consensus conference definition of severe sepsis. Med Care. 2014;526:e39–43.

    Article  Google Scholar 

  13. 13.

    Quach S, Blais C, Quan H. Administrative data have high variation in validity for recording heart failure. Can J Cardiol. 2010;26:306–12.

    Article  Google Scholar 

  14. 14.

    St Germaine-Smith CS, Metcalfe A, Pringsheim T, Roberts JI, Beck CA, Hemmelgarn BR, et al. Recommendations for optimal ICD codes to study neurologic conditions a systematic review. Neurology. 2012;79:1049–55.

    Article  Google Scholar 

  15. 15.

    Synthesis. Updated 2013. Accessed 25 Mar 2015.

  16. 16.

    Yergens DW, Dutton DJ, Patten SB. An overview of the statistical methods reported by studies using the canadian community health survey. BMC Med Res Methodol. 2014;14:15.

    Article  Google Scholar 

  17. 17.

    Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol. 2011;64:821–9.

    Article  Google Scholar 

  18. 18.

    Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.

    CAS  Article  Google Scholar 

  19. 19.

    Cevasco M, Borzecki AM, Chen Q, Zrelak PA, Shin M, Romano PS, et al. Positive predictive value of the AHRQ patient safety indicator “postoperative sepsis”: implications for practice and policy. J Am Coll Surg. 2011;212:954–61.

    Article  Google Scholar 

  20. 20.

    Gedeborg R, Furebring M, Michaëlsson K. Diagnosis-dependent misclassification of infections using administrative data variably affected the incidence and mortality estimates of ICU patients. J Clin Epidemiol. 2007;60:155–62.

    CAS  Article  Google Scholar 

  21. 21.

    Grijalva CG, Chung CP, Stein CM, Gideon PS, Dyer SM, Mitchel EF, et al. Computerized definitions showed high positive predictive values for identifying hospitalizations for congestive heart failure and selected infections in Medicaid enrollees with rheumatoid arthritis. Pharmacoepidemiol Drug Saf. 2008;17:890–5.

    Article  Google Scholar 

  22. 22.

    Ibrahim I, Jacobs IG, Webb SAR, Finn J. Accuracy of International Classification of Diseases, 10th Revision codes for identifying severe sepsis in patients admitted from the emergency department. Crit Care Resusc. 2012;14:112–8.

    PubMed  Google Scholar 

  23. 23.

    Lawson EH, Louie R, Zingmond DS, Brook RH, Hall BL, Han L, et al. A comparison of clinical registry versus administrative claims data for reporting of 30-day surgical complications. Ann Surg. 2012;256:973–81.

    Article  Google Scholar 

  24. 24.

    Madsen KM, Schonheyder HC, Kristensen B, Nielsen GL, Sorensen HT. Can hospital discharge diagnosis be used for surveillance of bacteremia? A data quality study of a Danish hospital discharge registry. Infect Control Hosp Epidemiol. 1998;19:175–80.

    CAS  Article  Google Scholar 

  25. 25.

    Ollendorf DA, Fendrick AM, Massey K, Williams GR, Oster G. Is sepsis accurately coded on hospital bills? Value Health. 2002;5:79–81.

    Article  Google Scholar 

  26. 26.

    Quan H, Eastwood C, Cunningham CT, Liu M, Flemons W, De Coster C, et al. Validity of AHRQ patient safety indicators derived from ICD-10 hospital discharge abstract data (chart review study). BMJ Open. 2013;3:e003716.

    Article  Google Scholar 

  27. 27.

    Ramanathan R, Leavell P, Stockslager G, Mays C, Harvey D, Duane TM. Validity of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) screening for sepsis in surgical mortalities. Surg Infect (Larchmt). 2014;15:513–6.

    Article  Google Scholar 

  28. 28.

    Schneeweiss S, Robicsek A, Scranton R, Zuckerman D, Soloman DH. Veteran’s Affairs hospital discharge database coded serious bacterial infections accurately. J Clin Epidemiol. 2007;60:397–409.

    Article  Google Scholar 

  29. 29.

    Whittaker SA, Mikkelsen ME, Gaieski DF, Koshy S, Kean C, Fuchs BD. Severe sepsis cohorts derived from claims-based strategies appear to be biased toward a more severely ill patient population. Crit Care Med. 2013;41:945–53.

    Article  Google Scholar 

  30. 30.

    Levy MM, Fink MP, Marshall JC, Abraham E, Angus D, Cook D, et al. International Sepsis Definitions Conference. 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference. Intensive Care Med. 2003;29:530–8.

    Article  Google Scholar 

  31. 31.

    Rohde JM, Odden AJ, Bonham C, Kuhn L, Malani PN, Chen LM, et al. The epidemiology of acute organ system dysfunction from severe sepsis outside of the intensive care unit. J Hosp Med. 2013;8:243–7.

    Article  Google Scholar 

  32. 32.

    O’Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD code accuracy. Health Serv Res. 2005;40:1620–39.

    Article  Google Scholar 

  33. 33.

    Poeze M, Ramsay G, Gerlach H, Rubulotta F, Levy M. An international sepsis survey: a study of doctors’ knowledge and perception about sepsis. Crit Care. 2004;8:R409–13.

    Article  Google Scholar 

  34. 34.

    Assunção M, Akamine N, Cardoso GS, Mello PV, Teles JM, Nunes AL, et al. Survey on physician’s knowledge of sepsis: do they recognize it promptly? J Crit Care. 2010;25:545–52.

    Article  Google Scholar 

  35. 35.

    Bone RC. Why new definitions of sepsis and organ failure are needed. Am J Med. 1993;95:348–50.

    CAS  Article  Google Scholar 

  36. 36.

    Trzeciak S, Zanotti-Cavazzoni S, Parrillo JE, Dellinger RP. Inclusion criteria for clinical trials in sepsis: did the American College of Chest Physicians/Society of Critical Care Medicine consensus conference definitions of sepsis have an impact? Chest. 2005;127:242–5.

    Article  Google Scholar 

  37. 37.

    Goto M, Ohl ME, Schweizer ML, Perencevich EN. Accuracy of administrative code data for the surveillance of healthcare-associated infections: a systematic review and metaanalysis. Clin Infect Dis. 2014;58:688–96.

    Article  Google Scholar 

  38. 38.

    Quan H, Khan N, Hemmelgarn BR, Tu K, Chen G, Campbell N, et al. Validation of a case definition to define hypertension using administrative data. Hypertension. 2009;54:1423–8.

    CAS  Article  Google Scholar 

  39. 39.

    Southern DA, Roberts B, Edwards A, Dean S, Norton P, Svenson LW, et al. Validity of administrative data claim-based methods for identifying individuals with diabetes at a population level. Can J Public Health. 2010;101:61–4.

    Article  Google Scholar 

  40. 40.

    Drösler SE, Romano PS, Sundararajan V, Burnand B, Colin C, Pincus H, et al. How many diagnosis fields are needed to capture safety events in administrative data? Findings and recommendations from the WHO ICD-11 Topic Advisory Group on Quality and Safety. Int J Qual Health Care. 2014;26:16–25.

    Article  Google Scholar 

Download references


We acknowledge the Alberta Sepsis Network Research grant provided by Alberta Innovates: Health Solutions (AIHS) for this research.

Author information



Corresponding author

Correspondence to Christopher J Doig.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

RJJ participated in the conception, design, acquisition of data and assessment of articles, analysis, and interpretation of the data, drafted the manuscript, and has given final approval for the version to be published. She takes responsibility for data accuracy and analysis and reporting integrity of the study. KJS participated in the design, acquisition of data and assessment of articles, and interpretation of the data. She was involved in revising the manuscript critically for important intellectual content, and has given final approval for the version to be published. DWY participated in the acquisition and interpretation of the data, provided literature review software and software analysis support. He was involved in revising the manuscript critically for important intellectual content, and has given final approval for the version to be published. HQ participated in the conception, design, and interpretation of the data. He was involved in revising the manuscript critically for important intellectual content, and has given final approval for the version to be published. NJ participated in the conception, design, and interpretation of the data. She was involved in revising the manuscript critically for important intellectual content, and has given final approval for the version to be published. CJD participated in the conception, design, and interpretation of the data. He was involved in revising the manuscript critically for important intellectual content, and has given final approval for the version to be published.

Authors’ information

NJ holds a Canada Research Chair in Neurological Health Services Research from the Canadian Institutes of Health Research and a Population Health Investigator Award from Alberta Innovates: Health Solutions (AIHS). HQ holds a Population Health Scholar Award from AIHS. DWY is a doctoral candidate.

Additional files

Additional file 1: Table S1.

Search strategy terms used in Ovid, MEDLINE and Embase databases.

Additional file 2: Table S2.

List of excluded articles and reason for exclusion.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jolley, R.J., Sawka, K.J., Yergens, D.W. et al. Validity of administrative data in recording sepsis: a systematic review. Crit Care 19, 139 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Severe Sepsis
  • Positive Predictive Value
  • Negative Predictive Value
  • Administrative Data
  • Case Definition