Taming the Wild Alarm System, Part 6
Why did they have to call it “Philosophy?”
I loved my engineering courses back in the 70s! I loved the science and principles behind studies such as heat transfer, fluid flow and thermodynamics. Everything could be calculated and optimized. We could even do calculations by submitting decks of computer punch cards for overnight runs on a mainframe. And get back printouts the next day covered in “syntax error.” So, back to the slide rule… Actually, the first true engineering pocket calculator with trig, log and exponential functions was introduced when I was at Louisiana Tech. The HP-35, what a breakthrough!
What did I NOT love? The dreaded mandatory “non-technical elective.” I really disliked going from the rigor of the laws of thermodynamics to the pointless, definitional word-chasing that was philosophy and sociology. (OK, I exaggerate a bit here, but some stereotypes are true!)
So in 2003, I began learning all of the details of alarm management. Now, this is also a science with principles to follow and predictable outcomes. But I was surprised and disappointed when I first came across the term “alarm philosophy document.” I thought, “Oh great, here’s some corporate-level wishy-washy document with a lot of words and platitudes, but that actually says nothing of real meaning.” (A long career in a large company had made me very familiar with such documents.)
Well, thankfully I was wrong. The alarm philosophy document is important and essential. It is a working document that bridges the correct principles of alarm management, to exactly how those principles are customized and applied in your own organization and with your own work practices. Since the whole alarm problem came about because of a lack of guiding principles for creating and maintaining an alarm system, this document supplies the rigorous knowledge and principles for success.
If you have not developed your alarm philosophy (AP) yet, here is some guidance. The AP describes a “to-be” state of an alarm system, not an “as-is.” It is a prescription for how to do alarms right, not a document about how you are dealing with them now. It is a comprehensive, detailed document, not a three-page overview.
The AP is also an alarm design guideline for both new systems and modifications to existing systems. It is for both in-house use and contractor use during projects for the initial alarm configuration. The first thing it does is to lay out what the alarm system is for, and what kinds of conditions are allowed to use the alarm system. This is necessary because distributed control system (DCS) vendors make creating alarms so easy, the alarm system is often used for all sorts of totally inappropriate things. For example, alarm systems are often the dumping ground for miscellaneous status indications.
The very definition of an alarm is important. Remember, the customer of the alarm system is the operator – not the staff, or engineers, or department heads. It is the person sitting at the console, often 24/7, who is responsible for running the process. So, the alarm system must be designed to be useful for that particular role. Remember that an alarm is an intentional interruption to the operator, and every alarm had better be important if we want the operator to take alarms seriously. (Many post-incident investigations have found that operators ignored the alarm system leading up to an incident because they felt it to be useless.)
ISA 18.2 defines an alarm as an “audible and/or visible means of indicating to the operator an equipment malfunction, process deviation, or abnormal condition requiring a timely response.” Please re-examine those very carefully chosen words. Unless the condition meets every aspect of that definition, it should not be allowed to use the alarm system as a means to inform the operator. (Note that the standard is weak about “audible and/or visible.” Since every alarm is going to be important, we want the operator to detect them all, and both aspects of annunciation are necessary to do that. It is common for unimproved alarm systems producing thousands of alarms a day to have the alarm sounds turned off. But once alarms have been rationalized, the sound needs to be back on.)
If you did nothing in alarm management but ensure that every alarmed condition in your control system met this definition, then you would likely have no alarm problem at all. But if you go to the control room and look at the list of alarms in effect, or that occurred this day or week, you will certainly find dozens that come nowhere near meeting this definition.
In our recorded webinars and in The Alarm Management Handbook, Second Edition, we go into how to apply the definition in some detail. We provide many examples of common situations that should never be alarms in the first place (but often are).
Here is an easy one. Do you alarm something whenever it is off? Likely that is a mistake. There is almost nothing in a plant that is not supposed to be off at some time or another. Off is a normal condition in many circumstances, and only abnormal conditions can be alarms. We see alarm screens covered in alarms because some equipment has been intentionally and rightly turned off. The correct paradigm is to only alarm something that is off when it is supposed to be on. This requires a bit of thought and logic, but such an alarm is easily achievable.
Here is another example. The definition requires operator action in response to an alarm. Well, what things constitute operator action, and what do not? Here are some well-accepted examples of “timely operator action.”
- Direct manipulation of the control system to effect a process change.
- Directing others (such as “outside operators”) to make process changes or take actions.
- Changing operating mode.
- Manual equipment changes (start pumps, manual operate valves, take samples, etc.).
- Begin troubleshooting/analysis of a situation. (This is a common operator response. An alarm such as “TANK 104 HIGH LEVEL” requires looking at many things that could have caused that condition, and addressing the ones that did. The direct operator actions will differ based on the cause.)
- Contacting other people or groups regarding a situation.
- Logging conditions for later examination, maintenance or repair, which can include initiating maintenance requests.
And some things that are definitely not “timely operator action” are:
- Thinking “OK, That’s nice to know.”
- Thinking “OK, The next shift can deal with that tomorrow.”
- Thinking “OK, the system is working normally!”
- Writing something down in a logbook (except for a maintenance work order).
Remember, alarms exist for the benefit of the operator. Besides hammering home this basic principle, the philosophy needs to accomplish many more items. That is why a decent one will run 50 pages or more. We are trying to avoid recreating, in our control systems, this bad example of a poorly implemented alarm:
The worst alarm design of all time!
Why the worst? Because it can mean anything from your gas cap being loose, to an imminent engine failure. And the “code” generated behind it requires special hardware to be read. The US government mandated this particular poorly designed alarm in 1996. We can do better.
Here is a typical table of contents of a comprehensive alarm philosophy. The more detailed you make your AP, the more time and money you will save in every further step of alarm management. For example, the specific alarm design considerations section can save hundreds of hours of work and eliminate inconsistent and redundant redesign efforts.
The philosophy sections concern many topics:
- Alarm system characteristics: the capabilities and limitations of your particular system.
- Alarm design principles and configuration: how to use the capabilities of your control system to create an effective alarm.
- Alarm rationalization: how to go through existing alarm systems that were not designed from a comprehensive AP and change them to follow proper principles.
- Alarm priority determination: using the priority attribute correctly and effectively.
- Alarm documentation and operator training: creating the information that must be available for every alarm and ensuring the operator can easily access it.
- Alarm system roles and responsibilities: determining who is to accomplish each alarm management task.
- Alarm handling methods (basic and advanced): discussing the right way to handle alarm suppression, alarm shelving, state-based alarming, and similar system capabilities.
- Alarm system performance monitoring: determining key performance indicators, monitoring them regularly, and taking positive action based on the results.
- Nuisance alarm resolution: finding and fixing them (read the second and third blogs in this series).
- Alarm detection, annunciation and depiction in the operator HMI: how to display alarms in a manner that ensures they can be seen and responded to in the correct order.
- Specific alarm design considerations: providing pre-designs for effective alarming of many common equipment conditions and situations.
- Operator response to alarms: defining and documenting what is and is not a response in terms of the definition of an alarm, and ensuring the operator is capable of making that response.
- Alarm system management of change: establishing the practice that any changes to the alarm system must be actively managed, made by capable people, and all operators and others involved informed of the change.
The typical size of an AP is 50-80 pages. Many plant sites may utilize more than one type of control system. If so, it is a good idea for the body of the document to describe the principles. Then an appendix for each type of control system can deal with exactly how those principles are accomplished given the capabilities and limitations of the different systems in use. As an example, you may decide on principle to use four different alarm priorities. But you may have a control system that offers only three. How do you adapt your principles to that? Or you may have one that offers more than 100 priorities. (Don’t use all of them!)
Having an AP is a mandatory requirement of the ISA 18.2 and IEC 62682 alarm management standards. Those standards spell out certain mandatory requirements for the AP’s content, with other content that is recommended. We have this white paper that explains these standards in detail: Understanding & Applying the ANSI-ISA 18-2 Alarm Management Standard
The alarm system is the most important tool the operator uses to become aware of abnormal situations and malfunctions. The intent of the alarm philosophy is to ensure that tool always helps the operator take the correct action at the correct time. Developing your own site’s AP is one of the first starting steps in alarm management, and it will be used for the entire life of the control system. The Alarm Management Handbook, Second Edition, contains full instructions on creating a comprehensive and detailed AP, along with many examples and common situations faced by industry in accomplishing effective alarming.
Review other Taming the Wild Alarm System topics in this blog series:
- How Did We Get In This Mess?
- The Most Important Alarm Improvement Technique in Existence
- SHUT UP! Fixing Chattering and Fleeting Alarms
- Just How Bad is Your Alarm System?
- Horrible Things We Find During Alarm Rationalization
- Why did they have to call it “Philosophy?”
- Beyond Alarm Management – Doing More with a Powerful Tool