- Open Access
Becoming a high reliability organization
Critical Carevolume 15, Article number: 314 (2011)
Aircraft carriers, electrical power grids, and wildland firefighting, though seemingly different, are exemplars of high reliability organizations (HROs) - organizations that have the potential for catastrophic failure yet engage in nearly error-free performance. HROs commit to safety at the highest level and adopt a special approach to its pursuit. High reliability organizing has been studied and discussed for some time in other industries and is receiving increasing attention in health care, particularly in high-risk settings like the intensive care unit (ICU). The essence of high reliability organizing is a set of principles that enable organizations to focus attention on emergent problems and to deploy the right set of resources to address those problems. HROs behave in ways that sometimes seem counterintuitive - they do not try to hide failures but rather celebrate them as windows into the health of the system, they seek out problems, they avoid focusing on just one aspect of work and are able to see how all the parts of work fit together, they expect unexpected events and develop the capability to manage them, and they defer decision making to local frontline experts who are empowered to solve problems. Given the complexity of patient care in the ICU, the potential for medical error, and the particular sensitivity of critically ill patients to harm, high reliability organizing principles hold promise for improving ICU patient care.
Primum non nocere, or 'first, do no harm', is a central concept in medicine, and yet patients are harmed every day. The Institute of Medicine (IOM) general definition of a medical error - 'the failure of a planned action to be completed as intended or the use of a wrong plan to achieve an aim' - is most useful when thinking about harm and potential harm to patients . Over the last decade, significant effort has been devoted to reducing medical error and improving patient safety. Initiatives based on concepts from the airline industry, such as the use of checklists and bundled interventions, have been popularized in order to decrease some discrete types of medical errors [2, 3]. Crucially, the IOM definition of medical error is more inclusive and brings to attention the great risk of missed diagnoses and suboptimal therapeutic plans - higher-order tasks that lie at the core of intensive care medicine. Yet accurate diagnoses and effective delivery of therapies are complicated by rapidly changing conditions, situations of high uncertainty, and incomplete knowledge - situations that are ubiquitous among the critically ill. To improve their practice, intensive care unit (ICU) clinicians may be able to learn from scholars who study industries that have an extremely low tolerance for error yet must maintain exceptionally high performance in quickly changing conditions. There are a number of different models to explain reliable and safe performance in high-risk organizations (including high reliability organizing [4, 5], the study of organizational accidents , resilience engineering , and normal accident theory ). In this paper, we explore how high reliability organizing, which is receiving increasing attention in health care, may inform the care of the critically ill patient.
High reliability organizing
'So you want to understand an aircraft carrier? Well, just imagine that it's a busy day, and you shrink San Francisco Airport to only one short runway and one ramp and one gate. Make planes take o and land at the same time, at half the present time interval, rock the runway from side to side, and require that everyone who leaves in the morning returns that same day. Make sure the equipment is so close to the edge of the envelope that it's fragile. Then turn o the radar to avoid detection, impose strict controls on radios, fuel the aircraft in place with their engines running, put an enemy in the air, and scatter live bombs and rockets around. Now wet the whole thing down with sea water and oil, and man it with 20-year-olds, half of whom have never seen an airplane up-close. Oh, and by the way, try not to kill anyone' .
Senior officer, Air Division
Aircraft carriers are fraught with potential accidents, yet they engage in nearly error-free operations and are a classic example of high reliability organizations (HROs) - organizations in which accidents rarely occur despite the error-prone nature of the work . Other examples of HROs are nuclear power plants, electrical power grids, air traffic control systems, commercial aviation, and wildland firefighting. Organizational scholars have studied these types of organizations for more than 20 years to understand how they are able to maintain such high performance under such challenging circumstances. As work in conventional organizations has become increasingly fast-paced and the margin for error has become ever smaller, the notion of 'high reliability' has received widespread attention in many contexts , including health care [11–18].
Despite everything we know about HROs, there is no recipe for transforming an organization into an HRO. Put another way, there is no easy path to achieving safe and reliable performance. Some HRO scholars emphasize the idea of high reliability organizing rather than high reliability organizations to highlight two issues. First, high reliability is not a state that an organization can ever fully achieve; rather, it is something the organization seeks or continually aspires to. Second, reliability is fundamentally a dynamic set of properties, activities, and responses.
Reliability and safety are difficult to observe and accomplish because, on the surface, it is often easier to appreciate what is not happening (catastrophic error) rather than what is happening (timely human adjustments). Consequently, reliability and safety have been described as 'dynamic non-events' : 'dynamic' in the sense that reliability and safety result from managing continuous change and 'non-events' in the sense that we recognize reliability and safety by the absence of other things (errors, mishaps, and accidents). As dynamic non-events, reliability and safety must be recurrently re-accomplished [10, 20] - just because there were no accidents yesterday does not mean the organization is safe today.
At the core of high reliability organizing is a set of principles embodied in processes and practices that enable organizations to focus attention on emergent problems and to deploy the right set of resources to address those problems. Noticing and responding to small disturbances and vulnerabilities allow the organization to take action to correct those small problems before they escalate into crisis or catastrophe. The advantage of catching small problems before they escalate is that there are more options to deal with them; however, the disadvantage is that small problems are also harder to spot.
High reliability organizing is characterized by five key principles that facilitate both problem detection and problem management . For problem detection, high reliability organizing involves (a) preoccupation with failure: using failure and near failure as ways to gain insight into the strengths and weaknesses of the system; (b) reluctance to simplify: avoiding the tendency to minimize or explain away problems; and (c) sensitivity to operations: being aware of the 'big picture', specifically how all the components of work fit together and how problems in one area can spread to other areas. For problem management, high reliability organizing involves (d) resilience: developing the capability to cope with un-expected events, and (e) deference to expertise: under-standing where the expertise is in the organization and ensuring that decisions about how to deal with problems are made by those experts. By enacting these principles in a set of daily processes and practices, HROs repeatedly and continually shape and reshape a binding safety culture.
While HROs value these five key principles, the processes and practices that enact those principles differ depending on the organization's unique context and the set of resources and constraints that it faces . This is important because, although health care shares many similarities with conventional HROs, it also is a setting with particular constraints that make it hard to enact (and embed) high reliability organizing . Health care resembles conventional HROs in that patient care involves complex and ambiguous tasks, a fast-paced environment, and highly hazardous and interdependent work in which error has potentially catastrophic consequences. Yet health care also differs in important ways: first, there is much less social dread and regulatory oversight associated with safety and reliability in health care (in contrast to other high-risk industries, like nuclear power). In part, this is because medical harm is individualized, distributed, and insidious - that is to say, medical harm occurs one patient at a time and is, therefore, sometimes overlooked as a serious societal problem. As a result, pressures for efficiency and cost containment in health care can crowd out an emphasis on safety. Second, health care lacks overall system coordination and is further complicated by frequent personnel changes that result in a shifting workforce of temporary teams that assemble at the patient's bedside. For example, many units in teaching hospitals have biweekly or monthly rotation of both trainees and staff. Third, there can also be variability related to information; health-care providers face situations in which they must take action, even with in-complete information or - paradoxically - too much information. Taken together, these factors contribute to significant variability in health care, both in terms of the nature of the work and in the accomplishment of the work.
High levels of variability can make it difficult to enact high reliability organizing principles. For instance, when work is accomplished by temporary teams, many of the 'taken for granted' aspects of high reliability organizing become more challenging. If people do not know each other - and, as a result, do not know who the experts on the team are - it becomes very complicated to create flexible decision structures and delegate problems to frontline experts in accordance with the fifth HRO principle, deference to expertise. To o set some of this variability, elements that support high reliability organizing can be incorporated into the more stable aspects of the organizational infrastructure. For example, in situations in which there is high turnover in staff, it makes sense to embed high reliability organizing principles not only in specific individuals but also in particular organizational roles and routines. We present two examples of how other industries - electricity grids and wildland firefighting - use roles and routines to help support high reliability organizing.
Example 1: Electrical power grids
The California Independent System Operator (CAISO) is the organization that manages the California high-voltage electrical power grid, one of the world's most important electricity systems. Electrical power grids are extraordinarily complicated to operate: electricity cannot be stored and must be generated as needed, and this makes balancing fluctuations in supply and demand very challenging. In addition, electricity grids require the co-ordination of multiple competing demands, and know-ledge about how to best ensure safety and reliability is difficult to acquire. Roe and Schulman  studied CAISO from 2001 to 2006 and found that there were a handful of personnel - controllers, dispatchers, technical supervisors, and department heads - who acted as 'reliability professionals'. Reliability professionals were responsible for making the difference between 'catastrophic failure of services we all depend on for life and livelihood and the routine functioning we have come to expect' .
Reliability professionals incorporate into their roles a deep commitment to the real-time reliability of their systems and develop particular skills around recognizing emergent problems and formulating responses to address those problems. They have lengthy and diverse work experience - they have typically worked in different parts of the organization and, as a result, have a deep under-standing of how the organization functions as a whole. They are usually found in midlevel positions because this vantage point gives them insight into the day -to -day operational details as well as the 'big picture' of the organization. They also network with other reliability professionals to share knowledge. Designating specific individuals or a set of individuals in a specific role to act as reliability professionals does not mean that the rest of the organization should ignore HRO principles. Instead, reliability professionals act as an additional safeguard and repository of knowledge, especially in situations in which the rest of the system has significant variability.
Example 2: Wildland firefighting
Massive wildland fires require that many diverse resources be brought together - a recent fire in the US involved more than 7,000 firefighters from over 458 fire agencies across 12 states . To ensure high reliability, the wildland fire community has developed two approaches to help manage these large numbers of personnel and equipment as well as the lack of familiarity. First, they use a highly structured method of organizing - called an Incident Command System - that has a hierarchical reporting system consisting of highly specified roles and associated responsibilities . Second, team leaders adopt a routine communication protocol to ensure that they are able to maintain an understanding of the unfolding situation (the third HRO principle, sensitivity to operations). They use an acronym - STICC, which stands for situation, task, intent, concern, and calibrate - to organize briefings about emerging problems [10, 24]. The STICC protocol is a commonly understood template for structuring conversations, and leaders follow each of the five steps in order (Table 1). By creating a strong structure through shared expectations about roles and routines, wildland firefighters are able to respond flexibly to the needs of the situation without having the temporary teams devolve into chaos.
These are only two small examples of the types of practices that HROs enact and that contribute to safe and reliable performance. HROs are distinguished by the principles to which they are committed (for example, pursuing safety as a priority objective and being preoccupied with failure) and the organizing processes and practices that they repeatedly enact on a daily basis. To summarize, high reliability organizing is not a prescription or a road map for success. It is one lens through which the pursuit of safe and reliable performance under trying conditions can be understood.
Applying high reliability organizing to critical care
The study of high reliability organizing in critical care is still in its early stages [12, 25]. As with any model derived in other settings, there are challenges in thinking about and studying how high reliability organizing translates to critical care. Yet ICUs are just the type of high-risk, high-hazard setting that benefit from high reliability organizing: the opportunity for error in the ICU is ubiquitous, and critically ill patients are especially vulnerable to harm. High reliability organizing may help prevent both failures in the organization of care and failures in a particular patient's care. However, as scholars begin to investigate how high reliability organizing can be implemented in critical care, it is important to reiterate that the processes and practices for enacting the five key principles of high reliability organizing will need to be tailored to the ICU context. (In Table 2, we suggest some potential ICU applications of each of these principles.)
A few general comments can be made about applying the concepts of HROs to ICUs. In the face of substantial mortality rates even among patients who get superb care, it can be difficult to 'embrace failure'. Embracing near failure means moving from a mindset of 'no harm, no foul' to searching out and reviewing near failures specifically to address areas of potential risk in an e ort to prevent future catastrophe. Flexibility in response to unexpected events and awareness of system impacts are hallmarks of patient care in the ICU. In highly differentiated multidisciplinary teams such as in an ICU, it is easy to find examples in which specific expertise may lie outside of the traditional hierarchy and may vary from shift to shift: a nurse's knowledge of a patient's prior responses to therapy or patient and family concerns and goals of care, a specific attending physician's unique expertise in an uncommon procedure, or a pharmacist's knowledge of medication interactions. But local cultures that identify and defer to this expertise may be more challenging to develop. Most important is the concept that high reliability is an ongoing process, not a state of achievement.
To appreciate how HRO principles might be enacted in the ICU, consider the common problem of failure to meet the timeliness targets for early goal-directed therapy of septic shock. More specifically, imagine an incident in which a patient with severe sepsis did not have rapid placement of a central venous catheter and assessment of hemodynamics. Whereas the conventional approach to this problem might involve waiting to address this issue until several examples of delayed resuscitation occur and result in at least one death, the HRO approach emphasizes preoccupation with failure and reluctance to simplify. An HRO approach means that this incident should be investigated immediately, even if the patient had excellent clinical outcomes. As the case is reviewed, particular attention should be devoted to understanding what else was going on in the ICU that might have led to delays - but doing so not in order to excuse the delays but in order to develop resilient systems that will allow appropriately rapid resuscitation even in the face of such other factors. Furthermore, the review should be carried out by a collegial multidisciplinary team, likely including nurses and interns (incorporating both deference to expertise and sensitivity to operations). The result of the review would be an approach to improving resuscitation of patients in septic shock, likely with a practice run and ongoing 'drills' (further developing resilience) prior to the arrival of the next patient in septic shock.
A recent focus on patient safety has catalyzed an awareness of how much room for improvement there is in most ICUs, even great ones. Some health-care scholars claim that the only realistic goal of safety management in complex health-care systems is to develop an intrinsic resistance to operational hazards . High reliability organizing is one way to foster this intrinsic resistance. Embracing HRO concepts will not necessarily be easy in the ICU, where there are simultaneous pressures for cost containment as well as often-changing team members, and ongoing evaluation will be needed as HRO processes and practices from non-ICU contexts are implemented in ICUs. Nonetheless, designing resilient health-care systems by thoughtfully embracing the central principles and philosophy of HROs holds great promise for improving the safety and reliability of critical care in the ICU.
This article is part of a series on Healthcare Delivery, edited by Dr Andre Amaral and Dr Gordon Rubenfeld.
California Independent System Operator
high reliability organization
intensive care unit
Institute of Medicine
situation, task, intent, concern, and calibrate.
Kohn LT, Corrigan JM, Donaldson MS, (Eds): To Err is Human: Building a Safer Health System. Washington, DC: National Academy Press; 2000.
Goeschel CA, Holzmueller CG, Cosgrove SE, Ristaino P, Pronovost PJ: Infection preventionist checklist to improve culture and reduce central line-associated bloodstream infections. Jt Comm J Qual Patient Saf 2010, 36: 571-575.
Carthey J, de Leval MR, Reason JT: Institutional resilience in healthcare systems. Qual Health Care 2001, 10: 29-32. 10.1136/qhc.10.1.29
Roberts KH: Some characteristics of one type of high reliability organization. Organization Science 1990, 1: 160-176. 10.1287/orsc.1.2.160
Weick KE, Sutcliffe KM, Obstfeld D: Organizing for high reliability: processes of collective mindfulness. In Research in Organizational Behavior. Volume 21. Edited by: Sutton RI, Staw BM. Greenwich, CT: JAI Press; 1999:81-124.
Reason J: Managing the Risks of Organizational Accidents. Aldershot, UK: Ashgate; 1997.
Hollnagel E, Woods DD, Leveson N, (Eds): Resilience Engineering: Concepts and Precepts. Burlington, VT: Ashgate; 2006.
Perrow C: Normal Accidents. Princeton, NJ: Princeton University Press; 1999.
Rochlin GI, La Porte TR, Roberts KH: The self-designing high reliability organization: aircraft carrier flight operations at sea. Naval War College Review 1987, Autumn: 76-90.
Weick KE, Sutcliffe KM: Managing the Unexpected: Resilient Performance in an Age of Uncertainty. 2nd edition. San Francisco, CA: Jossey-Bass; 2007.
Wilson KA, Burke CS, Priest HA, Salas E: Promoting health care safety through training high reliability teams. Qual Saf Health Care 2005, 14: 303-309. 10.1136/qshc.2004.010090
Madsen P, Desai V, Roberts K, Wong D: Mitigating hazards through continuing design: the birth and evolution of a pediatric intensive care unit. Organization Science 2006, 17: 239-248. 10.1287/orsc.1060.0185
Shapiro MJ, Jay GD: High reliability organizational change for hospitals: translating tenets for medical professionals. Qual Saf Health Care 2003, 12: 238-239. 10.1136/qhc.12.4.238
Vogus TJ, Sutcliffe KM: The impact of safety organizing, trusted leadership, and care pathways on reported medication errors in hospital nursing units. Med Care 2007, 45: 997-1002. 10.1097/MLR.0b013e318053674f
Vogus TJ, Sutcliffe KM, Weick KE: Doing no harm: enabling, enacting, and elaborating a culture of safety in health care. Academy of Management Perspectives 2010, 24: 60-77.
Blatt R, Christianson MK, Sutcliffe KM, Rosenthal MM: A sensemaking lens on reliability. Journal of Organizational Behavior 2006, 27: 897-917. 10.1002/job.392
Christianson MK, Sutcliffe KM: Sensemaking, high-reliability organizing, and resilience. In Patient Safety in Emergency Medicine. Edited by: Croskerry P, Cosby KS, Schenkel SM, Wears RL. Philadelphia, PA: Lippincott Williams & Wilkins; 2009:27-33.
Patterson ES: Communication strategies from high-reliability organizations: translation is hard work. Ann Surg 2007, 245: 170-172. 10.1097/01.sla.0000253331.27897.fe
Weick KE: Organizational culture as a source of high reliability. California Management Review 1987, 29: 112-127.
Weick KE: Organizing for transient reliability: the production of dynamic non-events. Journal of Contingencies and Crisis Management 2011, 19: 21-27. 10.1111/j.1468-5973.2010.00627.x
Schulman PR: The analysis of high reliability organizations: a comparative framework. In New Challenges to Understanding Organizations. Edited by: Roberts KH. New York: MacMillian; 1993:33-55.
Roe E, Schulman PR: High Reliability Management: Operating on the Edge. Stanford, CA: Stanford University Press; 2008.
Bigley GA, Roberts KH: The incident command system: high-reliability organizing for complex and volatile task environments. Academy of Management Journal 2001, 44: 1281-1299.
Klein G: Intuition at Work. New York, NY: Doubleday; 2002.
Iwashyna TJ, Kramer AA, Kahn JM: Intensive care unit occupancy and patient outcomes. Crit Care Med 2009, 37: 1545-1557. 10.1097/CCM.0b013e31819fe8f8
This work was supported by the National Institutes of Health via K08 HL091249 (TJI) and by the NIH/NHLBI T32: HL 07749-17 (MAM). We thank Andre Amaral and two anonymous reviewers for constructive comments on previous drafts.
The authors declare that they have no competing interests.