Beta This is a new way of showing guidance - your feedback will help us improve it.

Control systems

With respect to control systems, and the COMAH safety report assessment manual, the following Level 2 Criteria are of relevance:



An instrumented control system is an electrical, electronic, or programmable electronic system (E/E/PES) which may perform some or all of the following functions:

These functions are normally provided by, alarm, protection (trip, interlocks and emergency shutdown), and process control systems.

These engineered systems are individually and collectively described as control systems, and may be independent, or share elements such as the human interface, plant interface, logic, utilities, environment and management systems.

The human interface may comprise a number of input and output components, such as controls, keyboard, mouse, indicators, annunciators, graphic terminals, mimics, audible alarms, and charts.

The plant interface comprises inputs (sensors), outputs (actuators), and communications (wiring, fibre optic, analogue/digital signals, pneumatics, fieldbus, signal conditioning, barriers, and trip amplifiers).

The logic elements may be distributed, and linked by communications, or marshalled together and may be in the form of relays, discrete controllers or logic (electronic, programmable or pneumatic), distributed control systems (DCS), supervisory control and data acquisition (SCADA), computers (including PCs), or programmable logic controllers (PLC). The logic elements may perform continuous control functions, or batch or change of state (e.g. start-up/shut-down) sequences. It should also be noted that logic functions may be distributed to be undertaken within smart sensors or actuators.

Utilities are the power supplies and physical elements required for the systems, such as electricity and instrument air.

Environment is the physical accommodation and surroundings in which the control systems (including the operator) are required to work, including physical accommodation or routings, environmental conditions (humidity, temperature, flammable atmospheres), and external influences such as electromagnetic radiation and hazards which might affect the operation of the control system during normal or abnormal conditions such as fire, explosion, chemical attack etc.

Modern instrumented control systems are generally electrical, electronic or programmable electronic systems (E/E/PES), but some purely pneumatic systems may still be in operation.

Safety related systems

A control system or device is deemed to be safety related if it provides functions which significantly reduce the risk of a hazard, and in combination with other risk reduction measures, reduces the overall risk to a tolerable level, or if it is required to function to maintain or achieve a safe state for the equipment under control (EUC).

These functions are known as the safety functions of the system or device and are the ability to prevent initiation of a hazard or detect the onset of a hazard, and to take the necessary actions to terminate the hazardous event, achieve a safe state, or mitigate the consequences of a hazard.

All elements of the system which are required to perform the safety function, including utilities, are safety related, and should be considered part of the safety related system.

Safety related control systems may operate in low demand mode, where they are required to carry out their safety function occasionally (not more than once/year) or in high demand (more than once/year) or continuous mode where failure to perform the required safety function will result in an unsafe state or place a demand on another protective system. The likelihood of failure of a low demand system is expressed as probability of failure on demand, and as failure rate per hour for high/continuous demand systems.

Safety related control systems operating in continuous or high demand mode where the E/E/PES is the primary risk reduction measure have been known as HIPS (high integrity protective systems). However, use of such systems does not circumvent the need for a hierarchical approach to risk reduction measures such as inherent safety, and careful consideration of prevention of common mode failures by use of diverse technology and functionality (such as relief valves), independent utilities and maintenance and test procedures, physical separation, and external risk reduction (such as bunds). Measures should favour simple technological solutions rather than complex ones. The lowest failure rate which can be claimed for high integrity systems operating in continuous or high demand mode is 10-9 dangerous failures per hour.

It should be noted that control systems for equipment under control which are not safety related as defined above may also contribute to safety and should be properly designed, operated and maintained. Where their failure can raise the demand rate on the safety related system, and hence increase the overall probability of failure of the safety related system to perform its safety function, then the failure rates and failure modes of the non-safety systems should have been considered in the design, and they should be independent and separate from the safety related system.

A control system operating in continuous or high demand mode, for which a failure rate of less than 10-5/hr is claimed in order to demonstrate a tolerable risk, provides safety functions, and is safety related.

In some circumstances, the safety function may require the operator to take action, in which case, he/she is part of the safety related system and will contribute significantly to the probability of failure on demand (PFD). Typically, in a well designed system, a figure of 10-1 is assumed for the probability of an operator failing to take correct action on demand. Where exceptional care has been taken in design of human factors such as alarm management, instructions and training, and where such arrangements are monitored and reviewed, then a probability of failure on demand of not better than 10-2 may be achievable. Any supporting hardware or software, such as alarm systems, would also need the requisite integrity level).

System Claimed failure rate or probability of failure on demand
Non-safety related system Not better than 10-5/hr
Operator action 10-1/demand (typical)

10-2/demand (best)

High integrity protective system Not better than 10-9/hr

System integrity

The integrity required of a safety related system depends upon the level of risk reduction claimed for the safety function to be performed.

Safety integrity is the probability that safety related system will satisfactorily perform the required safety function under all stated conditions within a stated period of time when required to do so.

Safety integrity is therefore a function of performance and availability.

Performance is the ability of the system or device to perform the required safety function in a timely manner under all relevant conditions so as to achieve the required state.

Availability is the measure of readiness of the system to perform the required safety function on demand, and is usually expressed in terms of probability of failure on demand.

Performance and availability depend on:

The following measures are required to ensure adequate performance and availability of the safety related system:

Integrity levels


Historically, little industry guidance has been available for qualifying or quantifying safety integrity levels to achieve to achieve a requisite risk reduction.

Guidance related to determination of suitable integrity of programmable electronic systems in terms of configuration, reliability (quantitative and qualitative), and quality has been available in the HSE document Programmable Electronic Systems in safety related applications (PES 1 and 2) since 1987.

Additional guidance has also been available in EEMUA 160 Safety related instrument systems for the process industries.

However, most major companies will have developed internal standards which relate safety related system integrity to required risk reduction. These standards are likely to address the design process, system configuration, and demonstration that the required risk reduction has been achieved by qualitative or quantitative analysis of the failure rate of the design. They will also have procedures to ensure that the integrity is maintained during commissioning, operation, maintenance, and modification.

The latest applicable standard is BS IEC 61508 ‘Functional safety of electrical/electronic/programmable electronic safety-related systems’ which is in 7 parts. Parts 1, 3, 4, are published as British Standards, Part 5 is issued as an international IEC standard, and Parts 2, 6 and 7 remain in draft form.

Underlying philosophy

Integrity levels for safety related systems may be determined from the hazard and risk analysis of the equipment under control. A number of different methodologies are available, but the process includes identification of hazards and the mechanisms which can initiate them, risk estimation (likelihood of occurrence), and risk evaluation (overall risk based on likelihood and consequences). The risk estimation provides a measure of the risk reduction required to reduce the risk to a tolerable level.

Hazard identification results in the identification of safety functions which are required to control the risk.

The safety functions may then be allocated to a number of different systems including E/E/PES, other technology and external measures.

For each system providing a safety function, a failure rate measure can be assigned which in turn determines the integrity required of the system. alternatively, a qualitative approach (based on the likelihood and consequence of the hazard, and the frequency and level of exposure and avoidability) may be used to define the required integrity.

Safety Integrity Levels

IEC 61508 assigns four software and hardware safety integrity levels (SILs) to required measures of risk reduction. Guidance is then provided on the system configuration, level of subsystem fault tolerance and diagnostic coverage, and safety life-cycle measures required to achieve the designated hardware SIL, and the software methods and life-cycle measures required to achieve the designated software SIL. It also provides guidance on qualitative methods for establishing the SIL level required. Part 2 of the standard places architectural constraints on the hardware configuration by setting minimum fault tolerance and diagnostic coverage requirements for each element or subsystem. It should be noted that IEC 61508 limits the risk reductions which can be claimed for a safety related E/E/PES which operated in low demand mode or continuous mode to no better than 10-6 and 10-9 respectively for SIL4.

The requirement is more demanding for subsystems which do not have well defined behaviour modes or behaviour (e.g. programmable systems). The standard requires that a reliability model of the system architecture be created and the reliability predicted and compared with the target safety integrity level to confirm that the required risk reduction has been achieved.

It is necessary to demonstrate that the required level of integrity has been achieved in the design, installation, operation and maintenance of the system.

It should be noted that the integrity of a safety related system is critically dependant upon the detection and correction of dangerous failures. Where there is a low level of diagnostic coverage, as is usually the case with lower integrity systems, then the integrity is critically dependent upon the proof test interval. Where there is a high level of diagnostic coverage to automatically reveal failures on-line, for example for high demand high integrity systems, then the integrity is also heavily dependant upon the frequency of diagnostic checks, and the mean time to repair the equipment, which includes the diagnostic test interval.

SIL levels are now being quoted for proprietary subsystems (and certified by test bodies). Quoted SILs should be associated with proof test intervals, diagnostic coverage and fault tolerance criteria. They are useful for evaluation of architectural constraints, but do not eliminate the requirement to confirm that the requires safety integrity level for the safety function provided by the system has been achieved. Software includes high level user application programmes and parameter settings.

Alarm systems

Alarm systems alert operators to plant conditions, such as deviation from normal operating limits and to abnormal events, which require timely action or assessment.

Alarm systems are not normally safety related, but do have a role in enabling operators to reduce the demand on the safety related systems, thus improving overall plant safety.

However, where a risk reduction of better than 10-1 failures on demand is claimed then the alarm system, including the operator, is a safety related system which requires a suitable safety integrity level (SIL 1 or SIL 2 as defined by BS IEC61508).

EEMUA 191 ‘Alarm systems - a guide to design, management and procurement’ considers alarm settings, the human interface (alarm presentation), alarm processing and system management controls for both safety related and other alarm systems. It provides the following guidance in regard to safety related alarm systems:

Alarms which are not designated as safety should be carefully designed to ensure that they fulfil their role in reducing demands on safety related systems.

For all alarms, regardless of their safety designation, attention is required to ensure that under abnormal condition such as severe disturbance, onset of hazard, or emergency situations, the alarm system is remains effective given the limitations of human response. The extent to which the alarm system survives common cause failures, such as a power loss, should also be adequately defined.

Further guidance is available in EEMUA 191 ‘Alarm systems - a guide to design, management and procurement’, and CHID circular CC/Tech/Safety/9.

Alarm settings

The type of alarm and its setting should be established so as to enable the operator to make the necessary assessment and take the required timely action. Settings should be documented and controlled in accordance with the alarm system management controls.

Human interface (alarm presentation)

The human interface should be suitable. Alarms may be presented either on annunciator panel, individual indicators, VDU screen, or programmable display device.

Alarms lists should be carefully designed to ensure that high priority alarms are readily identified, that low priority alarms are not overlooked, and that the list remains readable even during times of high alarm activity or with repeat alarms.

Alarms should be prioritised in terms of which alarms require the most urgent operator attention.

Alarms should be presented within the operators field of view, and use consistent presentation style (colour, flash rate, naming convention).

Each alarm should provide sufficient operator information for the alarm condition, plant affected, action required, alarm priority, time of alarm and alarm status to be readily identified.

The visual display device may be augmented by audible warnings which should at a level considerably higher than the ambient noise at the signal frequency. Where there are multiple audible warnings, they should be designed so that they are readily distinguished from each other and from emergency alarm systems. They should be designed to avoid distraction of the operator in high operator workload situations. Where both constant frequency and variable frequency (including pulsed or intermittent) signals are used, then the later should denote a higher level of danger or a more urgent need for intervention.

Alarm processing

The alarms should be processed in such a manner as to avoid operator overload at all times (alarm floods). The alarm processing should ensure that fleeting or repeating alarms do not result in operator overload even under the most severe conditions. A number of alarm processing techniques include filtering, deadband, debounce timers, and shelving, are described in EEMUA 191 ‘Alarm systems - a guide to design, management and procurement’.

The presentation of alarms should not exceed that which the operator is capable of acting upon, or alternatively the alarms should be prioritised and presented in such a way that the operator may deal with the most important alarms without distraction of the others. Applicable alarm processing techniques include grouping and first-up alarms, eclipsing of lower grade alarms (e.g. suppression high alarm when the high-high activates) suppression of out of service plant alarms, suppression of selected alarms during certain operating modes, automatic alarm load shedding and shelving.

Care should be taken in the use of shelving or suppression to ensure that controls exist to ensure that alarms are returned to an active state when they are relevant to plant operation.

Alarm system management procedures

Management systems should be in place to ensure that the alarm system is operated, maintained and modified in a controlled manner. Alarm response procedures should be available, and alarm parameters should be documented.

The performance of the alarms system should be assessed and monitored to ensure that it is effective during normal and abnormal plant conditions. The monitoring should include evaluation of the alarm presentation rate, operator acceptance and response times, operator workload, standing alarm count and duration, repeat or nuisance alarms, and operator views of operability of the system. Monitoring may be achieved by regular and systematic auditing.

Matters which are not worthy of operator attention should not be alarmed.

Logging may be a suitable alternative for engineering or discrepancy events to prevent unnecessary standing alarms. A system for assessing the significance of such logged events to ensure timely intervention by maintenance personnel may be required.

Protection systems (Trips and Interlocks)

Protective tripping systems provide a defence against excursions beyond the safe operating limits by detecting a excursions beyond set points related to the safe operating limits (i.e. the onset of a hazard) and taking timely action to maintain or restore the equipment under control to a safe state. Trips should not be self resetting unless adequate justification has been made. Protective interlocks prevent those control actions which might initiate a hazard from being undertaken by an operator or process control system, and are by nature self-resetting.

Protection systems should indicate that a demand to perform a safety function has been made and that the necessary actions have been performed.


Protective systems should be sufficiently independent of the control system or other protective systems (electrical/electronic or programmable). Where there is an interface between systems (e.g. for indication, monitoring or shared components) or shared utilities (e.g. power), environment (e.g. accommodation, wiring routes) or management systems (maintenance procedures, personnel), then the method of achieving independence should be defined, and common cause failures adequately considered.

Measures to defend against common mode failures due to environmental interactions may include physical separation or segregation of system elements (sensors, wiring, logic, actuators or utilities) of different protective systems.

Independence will also be required for protection against systematic and common mode faults. Measures may include use of diverse technology for different protective systems. Where more than one E/E/PES protective system is used to provide the required risk reduction for a safety function, then adequate independence should be achieved by diverse technology, construction, manufacturer or software as necessary to achieve the requires safety integrity level.

Dependence on utilities

The action required from the protective system depend upon the nature of the process. The actions may be passive in nature, such as simple isolation of plant or removal of power, or they may be active in that continued or positive action is required to maintain or restore a safe state, for example by injection of inhibitor into the process, or provision of emergency cooling.

Active protective measures have a high dependence upon utilities, and may be particularly vulnerable to common mode failures. The scope of the protective system therefore includes all utilities upon which it depends, and they should have an integrity consistent and contributory to that of the remainder of the system.

Measures taken to defend against common mode failure of utilities will be commensurate with the level of safety integrity required, but may include standby or uninterruptable/reservoir supplies for electricity, air, cooling water, or other utilities essential for performance of the safety function. Such measures should themselves be of sufficient integrity.

Survivability and external influences

The protective system should be adequately protected against environmental influences, the effects of the hazard against which it is protecting, and other hazards which may be present. Environmental influences include power system failure or characteristics, lightning (BS 6651), electromagnetic radiation (EMR) (BS 6667, IEC 61000), flammable atmospheres (BS 5345, BS EN 60079, BS 6467, BS 7535, BS EN 50281), corrosive or humid atmospheres, ingress of water or dust (BS EN 60529), temperature, rodent attack, chemical attack, vibration physical impact, and other plant hazards.

Degradation of protection against environmental influences during maintenance and testing should have been considered and appropriate measures taken. e.g. Use of radios by maintenance personnel may be prohibited during testing of a protective system with the cabinet door open where the cabinet provides protection against EMR.

Protection against random hardware faults

The architecture of the protective system should be designed to protect against random hardware failure. It should be demonstrated that the required reliability has been achieved commensurate with the require integrity level. Defensive measures may include high reliability elements, automatic diagnostic features to reveal faults, and redundancy of elements (e.g. 2 out of 3 voting for sensors) to provide fault tolerance.

Protection common mode failures

Diversity of elements is not effective for protection against random hardware faults, but is useful in defence against common mode failures within a protective system.

Protection systematic failures

Protection against systematic hardware and software failures may be achieved by appropriate safety lifecycles (see IEC 61508, Out of Control).


Sensors include their connection to the process, both of which should be adequately reliable. A measure of their reliability is used in confirming the integrity level of the protective system. This measure should take into account the proportion of failures of the sensor and its process connection which are failures to danger.

Dangerous failures can be minimised by a number of measures such as:

Guidance on process connection is provided in BS 6739 British Standard Code of practice for instrumentation in process control systems: Installation design and practice.

Proof testing procedures should clearly set out how sensors are reinstated and how such reinstatement is verified after proof testing.

Maintenance procedures should define how sensors/transmitters are calibrated with traceability back to national reference standards by use of calibrated test equipment.

Other matters which will need to have been considered are:

Use of ‘SMART’ instruments requires adequate diagnostic coverage and fault tolerance (see architectural constraints in IEC 61508 Part 2), and measures to protect against systematic failures (software design/integration, inadvertent re-ranging during maintenance). Measures may include use of equipment in non-smart mode (analogue signal output, no remote setting) and equipment of stable design for which there is an extensive record of reliability under similar circumstances.

Actuators and signal conversion

Actuators are the final control elements or systems and include contactors and the electrical apparatus under control, valves (control and isolation), including pilots valves, valve actuators and positioners, power supplies and utilities which are required for the actuator to perform its safety function, all of which should be adequately reliable. A measure of their reliability is used in confirming the integrity level of the protective system. This measure should take into account the proportion of failures of the actuator under the relevant process conditions which are failures to danger.

Actuators are frequently the most unreliable part of the tripping process.

Dangerous failures can be minimised by a number of measures such as:

Other matters which should have been considered are:

Logic systems

Commonly, the logic systems for protective systems are electronic, but programmable and other technology systems (magnetic or fluidic/pneumatic) have been used.

The architecture of the logic system will be determined by the hardware fault tolerance requirements, for example dual redundant channels. Where a high level of integrity for the system is required (SIL3 or SIL4) then diverse hardware between channels may be employed. This should not be confused with diversity of independent protective systems.

Logic systems are likely to incorporate provisions for fault alarms and overrides, for which there should be suitable management control arrangements. They may also provide monitoring of input and output signal lines for detection of wiring (open circuit, short circuit) and sensors/actuators (stuck-at, out of range). Such monitoring may initiate an alarm, a trip action or, in a voting arrangement, disable the faulty element.

Software based systems should be adequately protected against systematic failures, for example by an appropriate hardware and software safety lifecycles, and suitable techniques and quality systems. Guidance is available in BS IEC 61508 Part 3, PES Parts 1 & 2, EEMUA 160, Out of Control, and IGasE SR15 - Programmable equipment in safety related applications.

Wiring and communications (signal transmission)

Transmitters, communications devices and wiring systems should be arranged to meet the requirements for survivability, protection against external influences and independence.

Independent systems or redundant channels should not share multicore cables with each other or power circuits, and may require diverse routes depending upon the safety integrity level to be achieved.

Measures to protect against failures include:

Use of fieldbus or other digital communication protocols in protective systems should be considered a novel approach requiring a thorough evaluation and demonstration of the safety integrity. EEMUA 189 'A guide to fieldbus applications in the process industry' provides limited guidance.


Utilities which are required for the protective system to perform its safety function may include power supplies such as electricity, air, inhibitor materials and their propellants, inert gas such as nitrogen, cooling water, steam, pilot flames and their gases all of which should be adequately reliable. Measures such as redundancy, and uninterruptable/reservoir supplies, and availability monitoring (e.g. loss of air alarm) may be required. Confirmation that the designed capacity of reserves is adequate should be demonstrated by test.

Utilities may also introduce external influences into the protective systems (e.g. from electrical supplies) .

Measures to protect against external influences may include:

Proof testing

The probability of failure on demand, or the failure rate of a protective system is critically dependent upon the frequency of proof testing and its ability to detect previously unrevealed failures of the system. The proof test interval should therefore be established accordingly, and as a rule of thumb for low demand systems, should be an order of magnitude less than the mean time between failure of the system and the demand rate.

Proof test procedures should be available which specify the success/failure criteria and detail how the test will be performed safely, including any management arrangements, operating restrictions and competence of personnel.

The tests should be arranged to reveal all dangerous failures which have been unrevealed in normal operation including the following measures:


Procedures should be available which detail the operation of the protective system including:


Procedures should be available for maintenance activities including:

For systems where a high diagnostic coverage is claimed, for example high integrity high systems, the probability of failure (expressed as failure rate) is critically dependant upon the mean time to repair the faults revealed. For such systems, the repair performance should monitored and reviewed against the design criteria.


A management system for control of modifications should be available to ensure that:

Remote diagnostic systems

Remote diagnostic systems have the potential to cause danger by initiating unexpected operations or by affecting safety functions by software/parameter modification or by diverting the control system processor from time critical functions.

The need for remote diagnosis should be justified, a risk assessment completed, and measures taken to ensure that safety is not affected by normal operation or malfunction of the diagnostic system, including the remote diagnostic terminal and software, communication link, and the control system diagnostic interface and software.

Consideration should be given to:

Whilst beyond the scope of HS(G)87 'Safety in the remote diagnosis of manufacturing plant and equipment', the publication provides a useful background to the subject.

Process control systems

Process control systems are primarily implemented for economic reasons. However, those which are not considered safety related should still be designed, installed, operated and maintained so that their failure does not place a rate demand in the protective system which was not anticipated in its design. Part 1 of BS IEC 61508 provides guidance. The dangerous failure modes of the control system should be determined and taken into account in overall safety system specification. The control system should also be sufficiently independent of the safety systems.

The control system may provide steady state or change of state (start-up, shutdown, batch) control functions. The latter may be implemented by automatic sequences or procedurally under manual control. Control systems should be implemented to provide stable control of the process under all expected normal and upset circumstances, including start-up and shutdown.

The system should be designed to prevent or verify operator commands which might place a demand upon the protective system.

The dangerous failure rate of the control system should be supported by operational experience of the system in a similar application, reliability analysis or reliability data from industry databases. The failure rate that may be claimed may not be less than 10-5 dangerous failures/hour.

Consideration should be given to failure behaviour so as to minimise the demands placed on the protective systems such as under the following circumstances:

Consideration should also be given to change control and software back-up systems. As the control system provides control, monitoring and logging functions which significantly aid the operator, consideration should be given to survival of the control system during hazardous events and emergency response.

It should be noted that redundant (non-diverse), cross monitored control processors are extremely vulnerable to common mode failure.

It should be demonstrated that the process control system does not exercise safety functions during sequences and changes of state under its control. For example, where the control system batch sequence controls the mixing of quantities of materials or reagents which, if incorrect quantities are admitted, may result in an unintended reaction, then measures of sufficient safety integrity, other than the control system, should be taken to ensure that the residual risk is as low as reasonable practicable.

For the purposes of risk evaluation, failure of the control system (at not less than 10-5 failures/hour or 10-1 failures on demand) should be considered as part of the hazard initiation sequence rather than a risk reduction measure.

Exothermic reactions

Exothermic reactions are particularly demanding in terms of control and protection as they tend to be unstable with aggressive reaction kinetics, and may require risk reduction measures which are required continuously throughout the reaction stage and which rely on utilities such as cooling systems, agitation, inhibitor injection etc.

Thus, loss of any single utility may be a dangerous failure, and initiate a hazard (e.g. loss of agitator blades, and hence reduced cooling because of poorer heat transfer, giving rise to a runaway reaction).

The components of the utilities should be considered safety related and provide adequate protection against failure including common mode failures (e.g. loss of electricity) and systematic failures (e.g. failure to fill inhibitor stock vessel). Sufficient diagnostics should be provided to reveal such failures so that timely automatic or manual response can be initiated.

Diagnostics should be designed to reveal the failure as directly as possible, for example:

Their capacity and capability to deal with the most extreme reaction kinetics (e.g. worst case mixtures) and limiting conditions (e.g. maximum temperature/pressure achievable under worst case) should also be demonstrated.

Expert systems

Expert systems are normally employed as operator support tools. Use of an expert or other deductive or learning system for direct process control should be considered novel and adequate assessment of the risks provided (see OM 1996/117).