Or, to be (slightly) less simplistic: PSA is a method for finding the dangerous combinations of failure events in complex systems that you didn’t consider when you designed them.
Most often when we think about designing safety systems we ask ourselves “what needs to succeed for this system to work, and is it likely enough?” PSA turns this on its head and instead asks “what needs to fail in order for this system NOT to work, and how likely is it?”
PSA also looks at the whole facility and the interactions and dependencies between different systems, under a broad set of circumstances including events beyond the design basis – i.e. what can be expected to happen when things happen that weren’t expected.
The inclusion of failure data as probabilities and frequencies then allows the PSA to quantify how often the system can be expected to fail, or in other words how often the designed safety margins will be insufficient or exceeded. This is why “PSA” (or “PRA” as it is also known) is called “Probabilistic Safety (or Risk) Assessment".
A simple example that most should be familiar with is the “system” of getting up and going to work in the morning. This is a far more common occurrence than the typical focus of a PSA, since it happens most days of the week, but other than that there are several aspects with some semblance of similarity.
We may know from experience we won’t wake up on time by ourselves. We have no statistical analysis to back it up, but can still say with high confidence it is very probable. So, we get an alarm clock. We set it to ring every morning, and if it is a good alarm clock it will have been designed with a high reliability, so we can count on it working most days.
But maybe one alarm isn’t enough – perhaps we snooze way too long, or even turn it off in our sleep without waking up. So we set up several alarms on the clock, and even get a back-up clock to ring from the other room if we’re really running late. We have now started building our complex of interacting systems.
Individually, the systems are all good and reliable. The alarm clock with several alarms on the bedside is enough most days to wake us up, and on the rare occasion when it isn’t, the one in the next room gets the job done. They may even seem reasonably independent, so we don’t think much more on them, yet we rely on them almost every day.
We might stop at this point and say that our wakeup-system is designed for everyday use and we expect it to work often enough; but if we continue and look for more cases that might cause our wakeup-system to fail, and then make a judgement of how likely these are, then we’ve started doing a very basic PSA.
Because one day comes the unexpected event. Perhaps the alarm clocks both run on batteries, and they both run out while we’re away on vacation, and when we come home we don’t think about checking them before the next morning, when we find to our dismay they’re not working, we’ve overslept, and now we’ll be late for the first day with the new boss! Or perhaps they’re both connected to the wall socket, and sometime during the night there was a short power outage, and we wake up to both of them blinking 00:00 with no alarms set! Clearly, a disaster just waiting to happen.
These two cases are extremely simple, and can be solved in simple ways, through diversification of the safety design or by adding redundancies: make one of them run on battery and have the other one connected to the wall socket; or put up a post-it to remind yourself to check the alarms before the important morning meeting tomorrow (a half-measure, since it won’t help against the nighttime power outage); or get a cat that wakes you up in the morning for breakfast no matter what day it is.
And with that, we’ve pretty much performed the barest form of PSA: we have a complex of systems working together to perform a specific function, we’ve identified some reasonably possible events that will challenge the systems, found common failure modes or dependencies, and found ways to deal with these issues that will significantly improve the reliability of us getting up and going to work on time.
Now imagine that instead of using alarm clocks, we fill the bedroom with a multi-layered Rube Goldberg machine, built by different people at different times, with some parts meant to wake us up and others to start the coffee machine or turn on the news, and you start to get closer to the level of complexity PSA was made to deal with.
PSA gives us a systematic approach to analyzing these kinds of complex systems (with or without the intentional chaos of a Rube Goldberg machine) to find the unexpected failure combinations that are most important, the things we don’t know about and didn’t think about when we designed the systems, and even to quantify the probability or frequency of different outcomes.
This is why PSA is so extensively used in (but not limited to) the nuclear industry, why it is a fundamental part of every nuclear power plant’s safety analysis and licence requirements.