(c) Erik Hollnagel, 2020
The methods that are used to manage safety have always closely matched the thinking about safety. This is actually inevitable, since the methods represent the tacit assumptions about how things happen in the world around us, particularly about how things go wrong.
Safety thinking cannot be static, but must undergo continuous development and revision so that it corresponds to the ‘real’ world. One development in safety thinking has to do with how things happen in the world, especially the cases where things go wrong. In order for safety management to be effective, our theories and models about the ‘inner workings’ of technical systems, socio-technical systems, and organisations must have a reasonable resemblance to what actually happens. Although the correspondence never can be perfect, it must be good enough to enable us to control what we do. This control is needed both to prevent that unwanted outcomes occur and to ensure that wanted outcomes occur.
Looking at the history of safety management – if not from the beginning of historical records then at least from the beginning of the industrialised era – the dominating trend has been that systems have become gradually more difficult to understand and control. This goes for technical systems, socio-technical systems and organisations alike. These developments have on the whole been matched by a corresponding development in methods, although usually with a considerable delay. That history is well-known and has been described in many publications. Briefly told, the thinking has gone from single factor models (such as ‘error proneness’), to simple linear models (such as the Domino model), to composite linear models (such as the Swiss cheese model), and to complicated multi-linear models (such as STAMP).
Common to all these methods is the unspoken assumption that outcomes can be understood as effects that follow from prior causes. This assumption (the causality credo) can be expressed as follows:
The development in methods has been mirrored by the invention of new types of causes whenever the ‘usual suspects’ turned out to be insufficient – generally in the aftermath of a major accident or disaster. The genealogy of causes goes from ‘acts of god’, to technical malfunctions and failures, to ‘human errors’, to organisational failures, to safety culture, and to complex systems – which for the time being represent the pinnacle of safety thinking. But in each case the new causes have been introduced without challenging the underlying assumption of causality. We have therefore become so used to explain accidents in terms of (linear) cause-effect relations that we no longer notice it.
The development of ever more detailed and intricate linear models to explain accidents can in principle continue forever – or at least until we reach the point when we must abandon the idea that explanations can be linear. The predilection for linear explanations is easy to understand: it enables us to decompose problems into smaller parts (steps, actions, components) that can be dealt with individually. And since each part only interacts with its immediate neighbours, there is no compelling need to consider the system as a whole. Or so we assume.
There is, however, a second development in safety thinking which is conceptually orthogonal to the first. The established approach to safety implies what one may call a hypothesis of different causes, namely that the causes or ‘mechanisms’ of adverse outcomes are different from those of successful outcomes. (If that was not the case, the elimination of such causes and the neutralisation of such ‘mechanisms’ would also reduce the likelihood that things could go right, hence hinder the system from achieving its purpose.) But while this hypothesis is convenient, its theoretical and empirical basis is highly questionable. Fortunately, there is a simple – if not even simpler - alternative, namely that things go right and go wrong in the same way. This idea is far from new and has been expressed in a number of ways. Abraham Lincoln, for instance, noted that “if the end brings me out all right what is said against me won’t amount to anything. If the end brings me out wrong, ten angels swearing I was right would make no difference.” Ernst Mach (1905) was more direct when he wrote that “knowledge and error flow from the same mental sources, only success can tell one from the other”.
The hypothesis that successes and failures are ‘two sides of the same coin’ has been one of the cornerstones of resilience engineering from the very beginning 10-15 years ago. But it was described at least as early as 1983, in the following way:
“Since errors are not intentional, and since we do not need a particular theory of errors, it is meaningless to talk about mechanisms that produce errors. Instead, we must be concerned with the mechanisms that are behind normal action. If we are going to use the term psychological mechanisms at all, we should refer to ‘faults’ in the functioning of psychological mechanisms rather than ‘error producing mechanisms’. We must not forget that in a theory of action, the very same mechanisms must also account for the correct performance which is the rule rather than the exception.” (Hollnagel, 1983).
While the purpose of the first (methodological) development is to make sure that models and methods are powerful enough to match the ever more complicated work environments (and accidents), the purpose of the second (conceptual or even ontological) development is to make sure that the assumptions underlying safety thinking are realistic and – if possible – correct. This second development has two important consequences. One is a rejection of the hypothesis of different causes. Because of that, safety studies and safety management should focus on what happens when things go well rather than when things go badly, corresponding to the change in thinking from Safety-I to Safety-II. The other is that models and methods are needed to understand how socio-technical systems and organisations work, and not just how they fail. The traditional approaches mentioned above clearly cannot do the job, both because they focus on failures and because they are linear.
The two developments are shown graphically in the following figure. The methodological development can be characterised by the ‘accident models’ that have been used to manage safety. The start was in single factor models, where characteristically the single factor was the human and the explanation was that some people were accident prone. (Greenwood & Woods, 1919; Schulzinger, 1956). This was followed by the multi-factor linear model (Heinrich, 1931), the multi-factor composite (or multi-linear) model (Reason, 1990), and finally the complicated hierarchical model (Leveson, 2004; Svedung & Rasmussen, 2002). The conceptual development is basically a change from a Safety-I to a Safety-II perspective, hence a change from a focus on failures to a focus on everyday activities.
The methodological developments shadow the developments in technologies. But whereas technologies develop continuously, and seemingly with a constant acceleration, methodologies develop in steps or jumps. The precipitating event is usual one or more accidents that cannot be analysed and/or explained satisfactorily with the existing methods; this prompts a change in methods as well as a change in causes, as illustrated by the vertical dimension in the figure. The conceptual developments have so far only resulted in one major change (although a more fine-grained account could be given as well). The change was brought about by resilience engineering and the insistence that “... an unsafe state may arise because system adjustments are insufficient or inappropriate rather than because something fails. In this view failure is the flip side of success, and therefore a normal phenomenon” (Hollnagel, Woods & Leveson, 2006). To do so requires methods that can describe how everyday activity takes place. Since socio-technical systems, with the possible exception of the most trivial kind, must be able to adjust their performance to the conditions, linear (cause-consequence) models are ruled out. Non-linear models and methods are required to account for a non-linear reality. But since ‘failures are the flip side of successes’, the same model can be used to elucidate a Safety-I and a Safety-II perspective. The need of specialised and detailed accident models is therefore passée.
References
Greenwood, M. and Woods, H. M. (1919). A report on the incidence of industrial accidents with special reference to multiple accidents. (Industrial Fatigue Research Board Report No.4) London: HMSO.
Heinrich, H. W. (1931). Industrial Accident Prevention: A Scientific Approach. New York: McGraw-Hill.
Hollnagel, E. (1983). Position paper on human error. Responses to Queries from the Program Committee. NATO Conference on Human Error, Bellagio, Italy, September 5-9.
Hollnagel, E., Woods, D. D. and Leveson, N. C. (2006). Resilience engineering: Concepts and precepts. Aldershot, UK: Ashgate.
Leveson, N. G. (2004). A new accident model for engineering safer systems. Safety Science, 42(4), 237-270.
Mach, E. (1905). Knowledge and error (New edition, 1976). Springer.
Reason, J. (1990). Human error. Cambridge University Press.
Schulzinger, M. S. (1956). The Accident Syndrome. Springfield, IL.: Charles E. Thomas.
Svedung, I. and Rasmussen, J. (2002). Graphic Representation of Accident Scenarios: Mapping System Structure and the Causation of Accidents. Safety Science, 40, 397-417.
(Posted 2014-03-05) (c) Erik Hollnagel.
According to the conventional interpretation of safety, here called Safety-I, safety denotes a condition where as little as possible goes wrong, the focus of practical efforts whether in management or analysis is therefore on the occurrence of unacceptable outcomes and on how to reduce their number to an acceptable level, ideally zero and the emphasis is on how to manage safety eo ipso, as seen in the ubiquitous safety management Systems (SMS).
This approach, however leads to somewhat of a paradox since Safety in this way is defined and measured more by its absence than by its presence, as noted by Reason, (2000). According to a Safety-I perspective an accident thus represents a situation or a condition where there is or was a lack of safety. Which immediately raises the obvious question of how it is possible to learn about something if it only is studied in situations where it is not there?No known sciences can do that-- except safety science!!! And furthermore how is it possible to manage something that is not there? The simple answer is that it is impossible! THE UNACCEPTABLE OUTCOMES THAT SAFETY MANAGEMENT FOCUS ON ARE THE RESULTS OF SOMETHING THAT HAPPENED IN THE PAST,BUT DOES NOT HAPPEN ANY LONGER IT CAN THEREFORE NOT BE MANAGED!!!-- While you can manage a process you cannot manage a product.These paradox fortunately disappears in the view proposed by Safety-II, where safety is defined as a condition where as much as possible goes well. An acceptable outcome therefore represents conditions where safety is present rather than absent, and efforts are accordingly directed at understanding how this happens and how one can ensure that it will happen also in the future. Logically, if as much as possible goes well, then as little as possible goes wrong,since in practice something cannot go well and go wrong at the same time. A Safety-II approach therefore achieves the same objective as a Safety-I approach, but does so in a completely different way. In Safety-II the concern is not to manage safety as a static outcome, hence using safety as a noun but to manage system performance safely, as a dynamic process, hence safely as an adverb. There is a crucial difference between managing safety and managing safely. The former represents a cost, since the purpose is to avoid something rather than to achieve something, while the latter represents an investment that directly contributes to productivity as well as increased revenue. It is therefore clearly more important and useful for a company to manage safely than to manage safety.
Since most work and most activities in practice go well, even though we fail to pay attention to them there will also be more cases to study sand learn from. Best of all, perhaps is that there is no need to wait for something to happen, i.e., to fail or go wrong. Something is happening all the time all we need to do is to pay attention to it
Reason, J. (2000). Safety paradoxes and safety culture. Injury Control & Safety Promotion, 7(1), 3-14.