Becoming a high reliability organization

Aircraft carriers, electrical power grids, and wildland firefighting, though seemingly different, are exemplars of high reliability organizations (HROs) - organizations that have the potential for catastrophic failure yet engage in nearly error-free performance. HROs commit to safety at the highest level and adopt a special approach to its pursuit. High reliability organizing has been studied and discussed for some time in other industries and is receiving increasing attention in health care, particularly in high-risk settings like the intensive care unit (ICU). The essence of high reliability organizing is a set of principles that enable organizations to focus attention on emergent problems and to deploy the right set of resources to address those problems. HROs behave in ways that sometimes seem counterintuitive - they do not try to hide failures but rather celebrate them as windows into the health of the system, they seek out problems, they avoid focusing on just one aspect of work and are able to see how all the parts of work fit together, they expect unexpected events and develop the capability to manage them, and they defer decision making to local frontline experts who are empowered to solve problems. Given the complexity of patient care in the ICU, the potential for medical error, and the particular sensitivity of critically ill patients to harm, high reliability organizing principles hold promise for improving ICU patient care.

decade, signifi cant eff ort has been devoted to reducing medical error and improving patient safety. Initiatives based on concepts from the airline industry, such as the use of checklists and bundled interventions, have been popularized in order to decrease some discrete types of medical errors [2,3]. Crucially, the IOM defi nition of medical error is more inclusive and brings to attention the great risk of missed diagnoses and suboptimal therapeutic plans -higher-order tasks that lie at the core of intensive care medicine. Yet accurate diagnoses and eff ective delivery of therapies are complicated by rapidly changing conditions, situations of high uncertainty, and incomplete knowledge -situations that are ubiquitous among the critically ill. To improve their practice, intensive care unit (ICU) clinicians may be able to learn from scholars who study industries that have an extremely low tolerance for error yet must maintain exceptionally high performance in quickly changing conditions. Th ere are a number of diff erent models to explain reliable and safe performance in high-risk organizations (including high reliability organizing [4,5], the study of organizational accidents [6], resilience engineering [7], and normal accident theory [8]). In this paper, we explore how high reliability organizing, which is receiving increasing attention in health care, may inform the care of the critically ill patient.

High reliability organizing
'So you want to understand an aircraft carrier? Well, just imagine that it's a busy day, and you shrink San Francisco Airport to only one short runway and one ramp and one gate. Make planes take off and land at the same time, at half the present time interval, rock the runway from side to side, and require that everyone who leaves in the morning returns that same day. Make sure the equipment is so close to the edge of the envelope that it's fragile. Th en turn off the radar to avoid detection, impose strict controls on radios, fuel the aircraft in place with their engines running, put an enemy in the air, and scatter live bombs and rockets around. Now wet the whole thing down with sea water and oil, and man it with 20-year-olds, half of whom have never seen an airplane up-close. Oh, and by the way, try not to kill anyone' [9].
Senior offi cer, Air Division

Abstract
Aircraft carriers, electrical power grids, and wildland fi refi ghting, though seemingly diff erent, are exemplars of high reliability organizations (HROs) -organizations that have the potential for catastrophic failure yet engage in nearly error-free performance. HROs commit to safety at the highest level and adopt a special approach to its pursuit. High reliability organizing has been studied and discussed for some time in other industries and is receiving increasing attention in health care, particularly in high-risk settings like the intensive care unit (ICU). The essence of high reliability organizing is a set of principles that enable organizations to focus attention on emergent problems and to deploy the right set of resources to address those problems. HROs behave in ways that sometimes seem counterintuitive -they do not try to hide failures but rather celebrate them as windows into the health of the system, they seek out problems, they avoid focusing on just one aspect of work and are able to see how all the parts of work fi t together, they expect unexpected events and develop the capability to manage them, and they defer decision making to local frontline experts who are empowered to solve problems. Given the complexity of patient care in the ICU, the potential for medical error, and the particular sensitivity of critically ill patients to harm, high reliability organizing principles hold promise for improving ICU patient care.
Aircraft carriers are fraught with potential accidents, yet they engage in nearly error-free operations and are a classic example of high reliability organizations (HROs)organizations in which accidents rarely occur despite the error-prone nature of the work [4]. Other examples of HROs are nuclear power plants, electrical power grids, air traffi c control systems, commercial aviation, and wildland fi refi ghting. Organizational scholars have studied these types of organizations for more than 20 years to understand how they are able to maintain such high performance under such challenging circumstances. As work in conventional organizations has become increasingly fast-paced and the margin for error has become ever smaller, the notion of 'high reliability' has received widespread attention in many contexts [10], including health care [11][12][13][14][15][16][17][18].
Despite everything we know about HROs, there is no recipe for transforming an organization into an HRO. Put another way, there is no easy path to achieving safe and reliable performance. Some HRO scholars emphasize the idea of high reliability organizing rather than high reliability organizations to highlight two issues. First, high reliability is not a state that an organization can ever fully achieve; rather, it is something the organization seeks or continually aspires to. Second, reliability is funda mentally a dynamic set of properties, activities, and responses.
Reliability and safety are diffi cult to observe and accomplish because, on the surface, it is often easier to appreciate what is not happening (catastrophic error) rather than what is happening (timely human adjust ments). Consequently, reliability and safety have been described as 'dynamic non-events' [19]: 'dynamic' in the sense that reliability and safety result from managing continu ous change and 'non-events' in the sense that we recognize reliability and safety by the absence of other things (errors, mishaps, and accidents). As dynamic non-events, reliability and safety must be recurrently re-accom plished [10,20] -just because there were no accidents yesterday does not mean the organization is safe today.
At the core of high reliability organizing is a set of principles embodied in processes and practices that enable organizations to focus attention on emergent problems and to deploy the right set of resources to address those problems. Noticing and responding to small disturbances and vulnerabilities allow the organi za tion to take action to correct those small problems before they escalate into crisis or catastrophe. Th e advantage of catching small problems before they escalate is that there are more options to deal with them; however, the disadvantage is that small problems are also harder to spot.
High reliability organizing is characterized by fi ve key principles that facilitate both problem detection and problem management [5]. For problem detection, high reliability organizing involves (a) preoccupation with failure: using failure and near failure as ways to gain insight into the strengths and weaknesses of the system; (b) reluctance to simplify: avoiding the tendency to minimize or explain away problems; and (c) sensitivity to operations: being aware of the 'big picture' , specifi cally how all the components of work fi t together and how problems in one area can spread to other areas. For problem management, high reliability organizing involves (d) resilience: developing the capability to cope with unexpected events, and (e) deference to expertise: understanding where the expertise is in the organization and ensuring that decisions about how to deal with problems are made by those experts. By enacting these principles in a set of daily processes and practices, HROs repeatedly and continually shape and reshape a binding safety culture.
While HROs value these fi ve key principles, the processes and practices that enact those principles diff er depending on the organization's unique context and the set of resources and constraints that it faces [21]. Th is is important because, although health care shares many similarities with conventional HROs, it also is a setting with particular constraints that make it hard to enact (and embed) high reliability organizing [15]. Health care resembles conventional HROs in that patient care involves complex and ambiguous tasks, a fast-paced environment, and highly hazardous and interdependent work in which error has potentially catastrophic consequences. Yet health care also diff ers in important ways: fi rst, there is much less social dread and regulatory oversight associated with safety and reliability in health care (in contrast to other high-risk industries, like nuclear power). In part, this is because medical harm is individualized, distributed, and insidious -that is to say, medical harm occurs one patient at a time and is, therefore, sometimes overlooked as a serious societal problem. As a result, pressures for effi ciency and cost containment in health care can crowd out an emphasis on safety. Second, health care lacks overall system coordination and is further complicated by frequent personnel changes that result in a shifting workforce of temporary teams that assemble at the patient's bedside. For example, many units in teaching hospitals have biweekly or monthly rotation of both trainees and staff . Th ird, there can also be variability related to information; health-care providers face situations in which they must take action, even with incomplete information or -paradoxically -too much infor mation. Taken together, these factors contribute to signifi cant variability in health care, both in terms of the nature of the work and in the accomplishment of the work.
High levels of variability can make it diffi cult to enact high reliability organizing principles. For instance, when work is accomplished by temporary teams, many of the 'taken for granted' aspects of high reliability organizing become more challenging. If people do not know each other -and, as a result, do not know who the experts on the team are -it becomes very complicated to create fl exible decision structures and delegate problems to frontline experts in accordance with the fi fth HRO principle, deference to expertise. To off set some of this variability, elements that support high reliability organizing can be incorporated into the more stable aspects of the organizational infrastructure. For example, in situations in which there is high turnover in staff , it makes sense to embed high reliability organizing principles not only in specifi c individuals but also in particular organizational roles and routines. We present two examples of how other industries -electricity grids and wildland fi refi ghting -use roles and routines to help support high reliability organizing.

Example 1: Electrical power grids
Th e California Independent System Operator (CAISO) is the organization that manages the California highvoltage electrical power grid, one of the world's most important electricity systems. Electrical power grids are extraordinarily complicated to operate: electricity cannot be stored and must be generated as needed, and this makes balancing fl uctuations in supply and demand very challenging. In addition, electricity grids require the coordination of multiple competing demands, and knowledge about how to best ensure safety and reliability is diffi cult to acquire. Roe and Schulman [22] studied CAISO from 2001 to 2006 and found that there were a handful of personnel -controllers, dispatchers, technical supervisors, and department heads -who acted as 'reliability professionals' . Reliability professionals were responsible for making the diff erence between 'catastrophic failure of services we all depend on for life and livelihood and the routine functioning we have come to expect' [22].
Reliability professionals incorporate into their roles a deep commitment to the real-time reliability of their systems and develop particular skills around recognizing emergent problems and formulating responses to address those problems. Th ey have lengthy and diverse work experience -they have typically worked in diff erent parts of the organization and, as a result, have a deep understanding of how the organization functions as a whole. Th ey are usually found in midlevel positions because this vantage point gives them insight into the day-to-day operational details as well as the 'big picture' of the organization. Th ey also network with other reliability professionals to share knowledge. Designating specifi c individuals or a set of individuals in a specifi c role to act as reliability professionals does not mean that the rest of the organization should ignore HRO principles. Instead, reliability professionals act as an additional safeguard and repository of knowledge, especially in situations in which the rest of the system has signifi cant variability.

Example 2: Wildland fi refi ghting
Massive wildland fi res require that many diverse resources be brought together -a recent fi re in the US involved more than 7,000 fi refi ghters from over 458 fi re agencies across 12 states [23]. To ensure high reliability, the wildland fi re community has developed two approaches to help manage these large numbers of personnel and equipment as well as the lack of familiarity. First, they use a highly structured method of organizing -called an Incident Command System -that has a hierarchical reporting system consisting of highly specifi ed roles and associated responsibilities [23]. Second, team leaders adopt a routine communication protocol to ensure that they are able to maintain an understanding of the unfolding situation (the third HRO principle, sensitivity to operations). Th ey use an acronym -STICC, which stands for situation, task, intent, concern, and calibrateto organize briefi ngs about emerging problems [10,24]. Th e STICC protocol is a commonly understood template for structuring conversations, and leaders follow each of the fi ve steps in order (Table 1). By creating a strong structure through shared expectations about roles and routines, wildland fi refi ghters are able to respond fl exibly to the needs of the situation without having the temporary teams devolve into chaos.
Th ese are only two small examples of the types of practices that HROs enact and that contribute to safe and reliable performance. HROs are distinguished by the principles to which they are committed (for example, pursuing safety as a priority objective and being preoccupied with failure) and the organizing processes and practices that they repeatedly enact on a daily basis. To summarize, high reliability organizing is not a prescrip tion or a road map for success. It is one lens through which the pursuit of safe and reliable performance under trying conditions can be understood.

Applying high reliability organizing to critical care
Th e study of high reliability organizing in critical care is still in its early stages [12,25]. As with any model derived in other settings, there are challenges in thinking about and studying how high reliability organizing translates to critical care. Yet ICUs are just the type of high-risk, highhazard setting that benefi t from high reliability organizing: the opportunity for error in the ICU is ubiquitous, and critically ill patients are especially vulnerable to harm. High reliability organizing may help prevent both failures in the organization of care and failures in a particular patient's care. However, as scholars begin to investigate how high reliability organizing can be implemented in critical care, it is important to reiterate that the processes and practices for enacting the fi ve key principles of high reliability organizing will need to be tailored to the ICU context. (In Table 2, we suggest some potential ICU applications of each of these principles.) A few general comments can be made about applying the concepts of HROs to ICUs. In the face of substantial mortality rates even among patients who get superb care, it can be diffi cult to 'embrace failure' . Embracing near failure means moving from a mindset of 'no harm, no foul' to searching out and reviewing near failures specifi cally to address areas of potential risk in an eff ort to prevent future catastrophe. Flexibility in response to unexpected events and awareness of system impacts are hallmarks of patient care in the ICU. In highly diff erentiated multidisciplinary teams such as in an ICU, it is easy to fi nd examples in which specifi c expertise may lie outside of the traditional hierarchy and may vary from shift to shift: a nurse's knowledge of a patient's prior responses to therapy or patient and family concerns and goals of care, a specifi c attending physician's unique expertise in an uncommon procedure, or a pharmacist's knowledge of medication interactions. But local cultures that identify and defer to this expertise may be more challenging to develop. Most important is the concept that high reliability is an ongoing process, not a state of achievement.
To appreciate how HRO principles might be enacted in the ICU, consider the common problem of failure to meet the timeliness targets for early goal-directed therapy of septic shock. More specifi cally, imagine an incident in which a patient with severe sepsis did not have rapid placement of a central venous catheter and assessment of hemodynamics. Whereas the conventional approach to this problem might involve waiting to address this issue until several examples of delayed resuscitation occur and result in at least one death, the HRO approach emphasizes preoccupation with failure and reluctance to simplify. An HRO approach means that this incident  Reluctance to simplify Be aware of cognitive bias in diagnosis and work to avoid premature diagnostic closure. Maintain and revisit broad diff erential diagnoses. Use multidisciplinary analyses as a basis for decision making.
Resist the tendency to ascribe only one cause to incidents and errors.
Sensitivity to operations Maintain awareness of the patient's overall condition rather than focus on one particular problem or organ system. Use tools that facilitate information sharing between team members (that is, electronic medical records). Monitor unit-wide and hospital-wide conditions, such as bed availability, personnel shortages, and unit acuity fl uctuations.

Resilience
Emphasize the importance of working together in multidisciplinary teams. Encourage fl exibility in team members to accommodate changes in unit acuity or hospital resources. Explicitly include training around how to manage unexpected events in ICU staff educational training.
Deference to expertise Foster knowledge of team members' particular strengths and weaknesses, including specialized services (that is, ability to manage a balloon pump). Use appropriate clinical pathways and protocols (that is, nursing-driven sedation and respiratory therapist-led weaning protocols). Institute multidisciplinary rounds on which nursing, respiratory therapy, pharmacy, and families have active voices and full participation.
should be investigated immediately, even if the patient had excellent clinical outcomes. As the case is reviewed, particular attention should be devoted to understanding what else was going on in the ICU that might have led to delays -but doing so not in order to excuse the delays but in order to develop resilient systems that will allow appropriately rapid resuscitation even in the face of such other factors. Furthermore, the review should be carried out by a collegial multidisciplinary team, likely including nurses and interns (incorporating both deference to expertise and sensitivity to operations). Th e result of the review would be an approach to improving resuscitation of patients in septic shock, likely with a practice run and ongoing 'drills' (further developing resilience) prior to the arrival of the next patient in septic shock.

Conclusions
A recent focus on patient safety has catalyzed an awareness of how much room for improvement there is in most ICUs, even great ones. Some health-care scholars claim that the only realistic goal of safety management in complex health-care systems is to develop an intrinsic resistance to operational hazards [3]. High reliability organizing is one way to foster this intrinsic resistance.
Embracing HRO concepts will not necessarily be easy in the ICU, where there are simultaneous pressures for cost containment as well as often-changing team members, and ongoing evaluation will be needed as HRO processes and practices from non-ICU contexts are implemented in ICUs. Nonetheless, designing resilient health-care systems by thoughtfully embracing the central principles and philosophy of HROs holds great promise for improving the safety and reliability of critical care in the ICU.