In this study, we compared organ dysfunction coding practices to patients’ objective clinical data using 9 years of data from a large electronic clinical database. We found that the sensitivity of hospital discharge codes for identifying hospitalizations with objective signs of acute organ dysfunction increased steadily over time for all organ dysfunction types, with the sole exception of thrombocytopenia. This trend was most striking for patients with acute kidney failure and respiratory failure codes. There was a simultaneous decrease in the positive predictive value for several types of organ dysfunction codes. Most notably, we observed a steady decrease in the average change in creatinine associated with acute kidney failure codes. We also observed a decrease in the proportion of patients assigned respiratory failure codes who required 2 or more days of mechanical ventilation.
Documentation and coding of acute organ failure is known to be imperfect; for example, in one retrospective study, appropriate documentation of acute kidney injury occurred in only 43 % of patients who had a doubling of baseline creatinine . To our knowledge, however, this is the first study to show changes in coding practices relative to objective clinical criteria over an extended period of time. It is likely that coding for organ dysfunction is increasing over time both because of the inherent desire to better document patients’ clinical states, and also because hospitals are eligible for higher reimbursements for caring for more complex patients. Changes in the reimbursement and policy landscape support this trend. For example, in the United States, the Centers for Medicare and Medicaid Services (CMS) transitioned from diagnosis-related group reimbursements into the current Medical Severity DRG (MS-DRG) system in 2007. The MS-DRG system explicitly ties reimbursement to severity of illness and spurred hospitals to make significant efforts to improve documentation and coding .
We found that the apparent rate of rise over time in patients with suspected infection and at least one kind of organ dysfunction was markedly higher using claims data compared to objective clinical markers. This suggests that imputing severe sepsis incidence using infection codes and organ dysfunction codes (without necessarily requiring explicit sepsis codes) can be misleading because physicians and hospitals are changing the ways they code for organ dysfunction. The largest contributor to this discrepant increase was a decrease in the threshold for coding for acute kidney failure over time, combined with rising sensitivity for capturing significant changes in baseline creatinine. In addition to financial pressures, the increase in coding for acute kidney injury over time may also be a result of changes in classifications by multidisciplinary collaborative groups that now include smaller changes in baseline serum creatinine . For example, the Acute Kidney Injury Network definition published in 2007 defined a rise in serum creatinine of ≥0.3 mg/dL as the first stage of acute kidney injury; previously, the Risk, Injury, Failure, Loss of kidney function, and End-stage kidney disease (RIFLE) consensus criteria defined a 1.5-fold increase in serum creatinine as the earliest stage of acute kidney injury [25, 26]. Interestingly, thrombocytopenia codes were the only type of organ dysfunction that did not increase in sensitivity in our study. This may be because, in contrast to most of the other types of organ dysfunction, thrombocytopenia is not on CMS’s list of major complications and comorbid conditions that factor most heavily into severity of illness assessment and reimbursements .
The mortality decline in patients with suspected infection and objective markers of organ dysfunction was less pronounced than the mortality decrease associated with organ dysfunction codes. This suggests that part of the apparent decline in severe sepsis mortality imputed from claims is likely due to the increasing inclusion of patients with milder organ dysfunction over time. We also found that the increase in mean number of dysfunctional organs was greater when using codes versus clinical data, and in fact the mean number of dysfunctional organs was decreasing in patients coded with severe sepsis (995.92). This indicates that estimating changes in the severity of sepsis based on codes alone is subject to bias, and also supports the notion that the threshold for assigning the explicit severe sepsis code is decreasing. These conclusions are in line with a prior trend analysis of data from the Nationwide Inpatient Sample from 2003 to 2007 that showed a paradoxical increase in the number of coded dysfunctional organ systems in patients with severe sepsis but decreasing in-hospital mortality rates and mean costs per case . A similar phenomenon may account for findings from the National Hospital Discharge Survey that demonstrated an increase in the proportion of patients with sepsis who had any organ failure from 19.1 % in 1979–1984 to 30.2 % in 1995–2000 .
Importantly, even in 2013, the sensitivity for most organ dysfunction codes was relatively low (60 % or less in most cases), indicating that claims still substantially underestimate the true occurrence of infection-related organ dysfunction. This suggests that there is still plenty of room for coding accuracy to improve and thus continue to bias future surveillance efforts using claims data. Conversely, if incentives are reversed, it is conceivable that the sensitivity of coding could decrease. A potential example where incentives might change is with the new sepsis bundle mandated by CMS in the US, which proposes to monitor adherence through retrospective review of patients with ICD-10 discharge codes for sepsis, severe sepsis, and septic shock. Measuring changes in any type of disease burden and associated outcomes is centrally dependent on having uniform definitions that are applied consistently over time. Because claims do not live up to this standard in many cases, there is a pressing need to develop objective and efficient surveillance strategies that are more resistant to changes in external forces. The increasing implementation and use of electronic medical record systems worldwide allows for the possibility of shifting surveillance from claims to clinical data, including patients’ laboratory values. These are less prone (although not entirely immune) to changes in use and interpretation over time .
Our findings also have implications beyond severe sepsis epidemiology. Several studies unrelated to sepsis have used administrative databases to examine trends in organ dysfunction and also found increasing incidences over time. For example, claims for acute kidney failure in Medicare data rose steadily from 1992 to 2001 while the associated mortality decreased . Likewise, Stefan et al. examined trends in acute respiratory failure using ICD-9-CM codes from the Nationwide Inpatient Sample and found a significant increase in incidence and total costs, but a decline in mortality and length of stay .
Our study has several limitations. First, we only used data from two academic hospitals in one city; further studies should explore the generalizability of our findings. Notably, however, our estimated incidence of organ dysfunction and trends in severe sepsis rates as ascertained via ICD-9-CM codes mirror national and international trends [5, 30]. Second, we used blood culture orders as our marker for suspected infection, but it is unclear if this captures the entire cohort of patients with sepsis. However, our findings were identical when using hospitalizations with infection or sepsis diagnoses at discharge, suggesting that these patterns of changing organ dysfunction coding are not unique to patients with blood culture orders. Third, it is possible that some patients being coded as acute respiratory failure are increasingly using noninvasive positive pressure ventilation over time and that therefore we underestimated the sensitivity and overestimated the decline in positive predictive value of claims codes for respiratory failure. However, if this is the case, this also underscores the changing and variable use of the term “respiratory failure” and the need for a more uniform definition. Fourth, we did not evaluate changes in coding for altered mental status since we did not have an objective measure for comparison. Lastly, our estimates of baseline values for laboratory values were derived from the “best” values during or 30 days prior to hospitalization, and this may not be accurate in some cases. However, we applied the same definitions for baseline values over the entire study period, minimizing the risk of any systematic bias.