Skip to main content

Taming the Wild Alarm System, Part 3

SHUT UP! Fixing Chattering and Fleeting Alarms

There are few things more annoying to operators than chattering and fleeting alarms. Chattering alarms have short durations. They occur, then quickly clear, then reoccur and clear, happening up to several times a minute. A few chattering alarms can ruin the overall performance of an alarm system. A good starting criteria for identifying chattering alarms would be an alarm that occurs and clears three or more times in a minute.

A fleeting alarm is also a short duration alarm. It appears then clears almost immediately (often within a few seconds), without the operator having to do anything. It does not immediately repeat. (If it did, it would be a chattering alarm.) Even so, it is still a distracting and pointless interruption for the operator, and on most systems has to be acknowledged to be removed from the alarm listing screens.

Both analog values (e.g., flows, pressures) and digital (on-off) signals, such as from switches or binary sensors, can and do chatter. Fleeting alarms can be addressed using the same methods for addressing chattering alarms, with minor differences. Chattering and fleeting alarms are very common. No alarm was intentionally designed to have these behaviors, and all of them CAN BE FIXED without a lot of trouble! Here’s how.

Start With Deadband
For chattering analog sensors, first look at the deadband of the alarm. So, what is deadband? Well, on-off control is the most basic form of control. A certain degree of deadband is placed around the process setpoint. If the process variable is lower or higher than the deadband, the control action is turned fully on (or fully off)––like a basic air conditioner or home heater.
 

Figure 1: Deadband and On-Off Control

On-off control can be used on a pump to fill (or empty) a tank (on at 20%, off at 80%, similar to how a toilet tank works). But you would not want to use it on something like automobile speed cruise control—going on and off of full throttle for every hill!

With on-off control, the process variable is always cycling around the setpoint and through the deadband. To achieve “tighter” control, reduce the amount of the deadband. The resulting side effect is the frequency of the oscillations increases, which reduces the life of the final control element (i.e., relays, control valves, etc.)

Deadband and Alarms
Similar to deadband for setpoints in process control, all alarms on analog values should also have an alarm deadband specified. All process signals have noise. As a process value passes through an alarm setpoint, any noise or slight variation of the signal causes multiple alarms if there is too small of an alarm deadband. 

This figure shows how a proper deadband, larger than the noise in the signal, reduces alarm events as the process value moves above a high alarm setpoint. Most DCSs allow for alarm deadband with several options. You should check the DCS documentation to configure deadband properly. Note that deadband should always be applied to analog signals before the application of the following delay time techniques!
 

Figure 2: Deadband and Alarms

Rigorous calculation is not usually necessary; the following good starting values can be used. Trial and error can also be used; pick a small starting point (1%-2%) and increment based on the results.

SIGNAL TYPE

Deadband

Flow

5%

Level

5%

Pressure

2%

Temperature

1%

Figure 3: Deadband Settings Based on Sensor Type

You may think that this is pretty basic information and everyone already knows this. We assure you, we have been solving this problem for decades and almost every system has hundreds of sensors with the alarm deadbands set at zero. Check yours!

Delay Time Analysis for Chattering and Fleeting Alarms
Deadband only applies to alarms on analog values. Often, worst-case chattering or fleeting alarms are associated with on-off signals such as pressure and level switches. (Don’t get me started on the many reasons you should not be using such devices––that would be a different blog.) There is another powerful method to use for these that is probably already a capability of your DCS. This method applies to both analog and digital point types. The method requires a bit of explaining, but once understood, the technique itself is simple. And the results you will get are so powerful; it is well worth the effort! 

There are two types of alarm delays available in many DCSs, namely the ON-delay and the OFF-delay. (The OFF-delay is sometimes referred to as a “debounce timer.”) Each delay setting is specified as a number of seconds, and applies only to the point specified (not “globally” for the entire DCS). Some point or alarm types may have either delay available, and some have only one of them. ON and OFF delays work differently and have different implications when used. These settings provide powerful methods for fixing chattering and fleeting alarms. Here is exactly how they work.

Alarm ON-Delay
Use of the ON-delay time parameter can prevent a short-duration alarm from ever being seen by the operator. This is particularly useful for fixing a fleeting alarm that does not usually repeat. With an ON-delay, the alarm does NOT immediately annunciate. It must remain in effect and NOT clear for the value of the ON-delay timer BEFORE it is actually annunciated to the operator.

Figure 4: ON-Delay Alarm Processing

The correct choosing of the ON-delay time parameter is quite important since, if used, even a valid alarm is not immediately presented to the operator. This will increase the overall time it takes for a proper response to be made. Such a delay could be a safety concern on some alarms. ON-delays of 30 seconds or less are generally not a problem for lower priority alarms. ON-delays of more than 30 seconds or a minute should be applied with much care, even for low priority alarms. You will often find that an ON-delay of only a few seconds can fix many chattering and fleeting alarms. 

Alarm OFF-Delay
Use of this powerful method can turn a string of repetitive, nuisance, chattering alarms into a single, longer-duration alarm event with NO initial delay. The alarm is immediately annunciated. When it clears, that clearing is NOT shown to the operator unless it remains clear for longer than the OFF-delay timer. Thus, an alarm that clears and quickly recurs is perceived by the operator as a single sustained alarm rather than as a series of recurring, short-duration ones.
 

Figure 5: OFF-Delay Alarm Processing

Using this technique, hundreds or thousands of nuisance, chattering alarm occurrences can become a single, longer-duration alarm occurrence with no initial annunciation delay. The key is the correct choosing of the delay time parameter to be greater than the normal time-between-alarms. 

The minor disadvantage to this technique also concerns the delay time. If the operator gets the alarm and takes a corrective action to eliminate it, they will not see a return-to-normal condition until after the delay time has expired, regardless if the action was immediately successful. In most cases, this is quite acceptable for OFF-delays of up to even a couple of minutes. The operator can see (for analogs) that the process value has moved below the alarm setpoint. 

So, how do you choose the number of seconds of delay for each chattering or fleeting alarm? Guesswork is not advisable. Instead, use your alarm analysis software. To do this, you want to perform two frequency analyses on each chattering or fleeting alarm. These are analyses of the times-in-alarm (durations) and times-between-alarms (intervals). 

Time-in-Alarm and Time-Between-Alarms
DCSs produce time-stamped event records of at least three things: the alarm occurrence itself, the return-to-normal event (created when the condition causing the alarm to occur has cleared), and the operator acknowledgement event (created when the operator hits the acknowledge key for the alarm). Consider the first two.

In your alarm analysis software, you will have recorded thousands of occurrences from your nuisance alarms. For each specific nuisance alarm, take each pair of alarm occurrences and return-to-normal events, and then subtract the timestamps. The result is the time-in-alarm (duration) for the alarm occurrence. In a similar method, the difference between an alarm occurrence timestamp and the prior alarm clear event timestamp is the time-between-alarms (interval). Round all of the timestamps to the nearest second.
 

Figure 6: Chattering and Fleeting Alarm Durations and Intervals

If you plot the results for thousands of events from a single alarm, you will likely see a graph similar to the following.

Figure 7: Alarm Delay Time Analysis Graph

The graph’s two curves are determined as follows:

  • Time-In-Alarm (Duration) graph: Y = count of alarm occurrences having the DURATION of X seconds. 
  • Time-Between-Alarms (Interval) graph: Y = count of alarm occurrences having the INTERVAL of X seconds. 

In the case shown, based on thousands of alarm-return pairs, most of the alarms from this point have durations (solid line) less than 10 seconds and the time-between-alarms (dotted line) is mostly less than 20 seconds. Obviously, an alarm that comes in, lasts less than 10 seconds, and then goes away all by itself does not meet the basic criteria for an alarm––something requiring operator action to resolve! 

When you plot these durations, the area under the curve totals 100% of the alarm occurrences from the single alarm being analyzed. In the example, the alarm in question had thousands of activations lasting ten seconds or less. In fact, 93% of all activations of the alarm lasted 15 seconds or less. Those alarms did not return to normal because of responsive operator action. They indicated some sort of transient condition that did not require operator action to resolve. However, some of the alarms did remain valid for several minutes.
 

Figure 8: ON-Delay (Duration) Histogram Percentage Determination

This is very powerful information to use when coupled with the ON-delay and OFF-delay abilities of the DCS. For each alarm, it is straightforward to generate a table similar to the following one. This numerical analysis yields the exact percentage of how many alarms would be eliminated based on the choice and type of delay time. The table and charts let you locate the diminishing returns and pick your delay correctly. 

Delay in
Seconds

% Reduction

% Reduction

Time-In-Alarm
(ON-Delay)

Time-Between-Alarms
(OFF-Delay)

5

77.7

19.7

10

87.6

37.8

15

93.0

48.7

20

95.4

58.4

25

96.1

62.4

30

96.5

64.1

35>

97.6

66.5

40

97.8

68.7

45

97.9

69.6

50

98.2

70.6

55

98.5

71.6

60

98.5

72.2

65

98.6

72.4

70

98.7

73.2

75

98.7

73.6

80

98.7

74.1

85

98.7

74.6

90

98.9

75.1

95

99.0

75.7

100

99.0

75.8

105

99.0

76.0

110

99.2

76.4

115

99.2

76.9

120

99.2

77.2

Figure 9: Delay Time Alarm Reduction Table

For this alarm, an ON-delay of 30 seconds would eliminate over 96% of the alarm occurrences. An OFF-delay of one minute would eliminate 72% of the occurrences. It is typical for OFF-delay to be less powerful than ON-delay. For the same specified time delay, OFF-delay will generally eliminate fewer alarm occurrences than ON-Delay. Depending on the control system, there may be restrictions around the choices of delay types. 

These calculations do not determine “why” the chattering alarm behavior is occurring. A (time-consuming) root cause investigation of the process conditions and the sensing hardware that results in chattering and fleeting behavior might find installation or hardware problems. The implementation of delay times is more of a highly effective band-aid solution. 

As an example, I was in a chemical plant control room that was (by design) slightly pressurized by the HVAC system. It had an alarm to indicate if that slight positive pressure was lost. The sensor for that was placed right above the exit door to the outside! So, the loss-of-pressurization alarm became a “someone-is-using-the-door” alarm, which was of no value. The answer was to move the sensor, although an ON-delay of about 20 seconds would have worked as well. 

Hexagon Alarm Mechanic
Hexagon’s Plantstate Integrity™ product (PSI) is a comprehensive alarm management solution. Its alarm analysis module automates this entire delay-time analysis with the Alarm Mechanic feature. Any alarm in a typical analysis list (such as most frequent, chattering or fleeting) can be selected, and a new analysis window opens. Thousands of that alarm’s past occurrences are automatically analyzed, and the effect of both ON and OFF-delay times are shown with both a graph and a table. The graph shows the point of diminishing returns, and the table provides the exact amount of alarm reduction for any specific delay time choice!

The Alarm Mechanic analysis can be saved as a PDF for documentation of management of change. Solving a nuisance alarm has now become so easy that they should disappear from the face of the Earth!
 

Figure 10: Hexagon Alarm Mechanic Analysis Display

This method uses actual occurrence data to determine the proper delay-time value. When implementing new points, you initially have no such data to use. What should be the defaults? The answer requires some explanation.

Implementation of either ON-delay or OFF-delay is different than the implementation of deadband. In specifying deadband, the physics of the situation generally contraindicates the use of a zero default. But for many points, a zero ON or OFF-delay may be perfectly acceptable. Both EEMUA 191 and ISA-18.2 document some basic warnings about the use of delay time. Here is some more thorough guidance.

Figure 11: Recommended Daly Times Based on Signal Type

Summary
There are widespread problems with frequent and repetitive nuisance alarms. Chattering and fleeting alarms are specifically known as often being the worst offenders. The methods for fixing them are known and easily applied. There is no reason to have a poorly performing alarm system filled with nuisance alarms!

For much more detail, we recommend this free white paper: Making a Big Dent In Nuisance Alarms

And, of course, The Alarm Management Handbook, Second Edition

Review other "Taming the Wild Alarm System" topics in this blog series: 

  1.  How Did We Get In This Mess?
  2. The Most Important Alarm Improvement Technique in Existence

In the next blog, we will address the fundamental document you must have if you want a good alarm system. This is the Alarm Philosophy document. We will cover why you need it and what is in it. And feel free to contact me, Bill Hollifield, at bill.hollifield@hexagon.com with questions.

About Bill Hollifield

Bill Hollifield is the Hexagon Principal Alarm Management and High Performance HMI consultant, with more than 25 years of experience in the process industry in engineering, operations, and control systems, and an additional 20 years in alarm management consulting and services for the petrochemical, power generation, pipeline, mining, and other industries. He is a member of the ISA-18.2 Alarm Management committee, the ISA SP101 HMI committee, the American Petroleum Institute’s API RP-1167 Alarm Management Recommended Practice committee, and the Engineering Equipment and Materials Users Association (EEMUA) Industry Review Group. In 2014, Bill was named an ISA Fellow for industry contributions in these areas.

Bill is also the co-author of The Alarm Management Handbook, First and Second Editions, © PAS 2010

The High Performance HMI Handbook, © PAS 2008, The ISA book: Alarm Management: A Comprehensive Guide, Second Edition, © ISA 2011 and The Electric Power Research Institute (EPRI) guideline on Alarm Management for Power Generation (2008) and Power Transmission (2016). He has authored several papers, articles and ISA technical reports on Alarm Management and High Performance HMI and is a regular presenter on such topics at API, ISA, and Electric Power symposiums. He has a BSME from Louisiana Tech University, an MBA from the University of Houston, and has built his own plane (an RV-12) with a High Performance HMI.