Control systems

With respect to control systems, and the COMAH safety report assessment manual, the following Level 2 Criteria are of relevance:

Criterion

Introduction

An instrumented control system is an electrical, electronic, or programmable electronic system (E/E/PES) which may perform some or all of the following functions:

  • Monitoring, recording and logging of plant status and process parameters;
  • Provision of operator information regarding the plant status and process parameters;
  • Provision of operator controls to affect changes to the plant status;
  • Automatic process control and batch/sequence control during start-up, normal operation, shutdown, and disturbance. ie control within normal operating limits;
  • Detection of onset of hazard and automatic hazard termination (ie control within safe operating limits), or mitigation;
  • Prevention of automatic or manual control actions which might initiate a hazard.

These functions are normally provided by, alarm, protection (trip, interlocks and emergency shutdown), and process control systems.

These engineered systems are individually and collectively described as control systems, and may be independent, or share elements such as the human interface, plant interface, logic, utilities, environment and management systems.

The human interface may comprise a number of input and output components, such as controls, keyboard, mouse, indicators, annunciators, graphic terminals, mimics, audible alarms, and charts.

The plant interface comprises inputs (sensors), outputs (actuators), and communications (wiring, fibre optic, analogue/digital signals, pneumatics, fieldbus, signal conditioning, barriers, and trip amplifiers).

The logic elements may be distributed, and linked by communications, or marshalled together and may be in the form of relays, discrete controllers or logic (electronic, programmable or pneumatic), distributed control systems (DCS), supervisory control and data acquisition (SCADA), computers (including PCs), or programmable logic controllers (PLC). The logic elements may perform continuous control functions, or batch or change of state (e.g. start-up/shut-down) sequences. It should also be noted that logic functions may be distributed to be undertaken within smart sensors or actuators.

Utilities are the power supplies and physical elements required for the systems, such as electricity and instrument air.

Environment is the physical accommodation and surroundings in which the control systems (including the operator) are required to work, including physical accommodation or routings, environmental conditions (humidity, temperature, flammable atmospheres), and external influences such as electromagnetic radiation and hazards which might affect the operation of the control system during normal or abnormal conditions such as fire, explosion, chemical attack etc.

Modern instrumented control systems are generally electrical, electronic or programmable electronic systems (E/E/PES), but some purely pneumatic systems may still be in operation.

Safety related systems

A control system or device is deemed to be safety related if it provides functions which significantly reduce the risk of a hazard, and in combination with other risk reduction measures, reduces the overall risk to a tolerable level, or if it is required to function to maintain or achieve a safe state for the equipment under control (EUC).

These functions are known as the safety functions of the system or device and are the ability to prevent initiation of a hazard or detect the onset of a hazard, and to take the necessary actions to terminate the hazardous event, achieve a safe state, or mitigate the consequences of a hazard.

All elements of the system which are required to perform the safety function, including utilities, are safety related, and should be considered part of the safety related system.

Safety related control systems may operate in low demand mode, where they are required to carry out their safety function occasionally (not more than once/year) or in high demand (more than once/year) or continuous mode where failure to perform the required safety function will result in an unsafe state or place a demand on another protective system. The likelihood of failure of a low demand system is expressed as probability of failure on demand, and as failure rate per hour for high/continuous demand systems.

Safety related control systems operating in continuous or high demand mode where the E/E/PES is the primary risk reduction measure have been known as HIPS (high integrity protective systems). However, use of such systems does not circumvent the need for a hierarchical approach to risk reduction measures such as inherent safety, and careful consideration of prevention of common mode failures by use of diverse technology and functionality (such as relief valves), independent utilities and maintenance and test procedures, physical separation, and external risk reduction (such as bunds). Measures should favour simple technological solutions rather than complex ones. The lowest failure rate which can be claimed for high integrity systems operating in continuous or high demand mode is 10-9 dangerous failures per hour.

It should be noted that control systems for equipment under control which are not safety related as defined above may also contribute to safety and should be properly designed, operated and maintained. Where their failure can raise the demand rate on the safety related system, and hence increase the overall probability of failure of the safety related system to perform its safety function, then the failure rates and failure modes of the non-safety systems should have been considered in the design, and they should be independent and separate from the safety related system.

A control system operating in continuous or high demand mode, for which a failure rate of less than 10-5/hr is claimed in order to demonstrate a tolerable risk, provides safety functions, and is safety related.

In some circumstances, the safety function may require the operator to take action, in which case, he/she is part of the safety related system and will contribute significantly to the probability of failure on demand (PFD). Typically, in a well designed system, a figure of 10-1 is assumed for the probability of an operator failing to take correct action on demand. Where exceptional care has been taken in design of human factors such as alarm management, instructions and training, and where such arrangements are monitored and reviewed, then a probability of failure on demand of not better than 10-2 may be achievable. Any supporting hardware or software, such as alarm systems, would also need the requisite integrity level).

System Claimed failure rate or probability of failure on demand
Non-safety related system Not better than 10-5/hr
Operator action 10-1/demand (typical)

10-2/demand (best)

High integrity protective system Not better than 10-9/hr

System integrity

The integrity required of a safety related system depends upon the level of risk reduction claimed for the safety function to be performed.

Safety integrity is the probability that safety related system will satisfactorily perform the required safety function under all stated conditions within a stated period of time when required to do so.

Safety integrity is therefore a function of performance and availability.

Performance is the ability of the system or device to perform the required safety function in a timely manner under all relevant conditions so as to achieve the required state.

Availability is the measure of readiness of the system to perform the required safety function on demand, and is usually expressed in terms of probability of failure on demand.

Performance and availability depend on:

  • Proper design or selection, installation and maintenance and testing of the plant interfaces, including sensors actuators and logic, for the required duty and full range of process and environmental conditions under which they will be required to operate, including, where necessary, any excursions beyond the safe operating limits of the plant;
  • Accuracy and repeatability of the instrumentation;
  • Speed of response of the system;
  • Adequate margins between normal and safe operating limits and the system settings;
  • Reliability;
  • Survivability from the effects of the hazardous event or other external influences such as power system failure or characteristics, lightning, electromagnetic radiation (EMR), flammable, corrosive or humid atmospheres, temperature, rodent attack, vibration physical impact, and other plant hazards;
  • Independence (the ability of the system to act alone, without dependence on other protective measures, control systems or common utilities or to be influenced by them.

The following measures are required to ensure adequate performance and availability of the safety related system:

  • Protection against random failures by hardware reliability, fault tolerance (e.g. by redundancy) and fault detection (diagnostic coverage, and proof testing);
  • Protection against systematic and common mode failures by a properly managed safety lifecycle, independence from common utilities, common management systems and other protective systems, and by diversity. The lifecycle includes hazard and risk evaluation, specification, design, validation, installation, commissioning, operation, maintenance, and modification and are detailed in BS IEC 61508.

Integrity levels

History

Historically, little industry guidance has been available for qualifying or quantifying safety integrity levels to achieve to achieve a requisite risk reduction.

Guidance related to determination of suitable integrity of programmable electronic systems in terms of configuration, reliability (quantitative and qualitative), and quality has been available in the HSE document Programmable Electronic Systems in safety related applications (PES 1 and 2) since 1987.

Additional guidance has also been available in EEMUA 160 Safety related instrument systems for the process industries.

However, most major companies will have developed internal standards which relate safety related system integrity to required risk reduction. These standards are likely to address the design process, system configuration, and demonstration that the required risk reduction has been achieved by qualitative or quantitative analysis of the failure rate of the design. They will also have procedures to ensure that the integrity is maintained during commissioning, operation, maintenance, and modification.

The latest applicable standard is BS IEC 61508 'Functional safety of electrical/electronic/programmable electronic safety-related systems' which is in 7 parts. Parts 1, 3, 4, are published as British Standards, Part 5 is issued as an international IEC standard, and Parts 2, 6 and 7 remain in draft form.

Underlying philosophy

Integrity levels for safety related systems may be determined from the hazard and risk analysis of the equipment under control. A number of different methodologies are available, but the process includes identification of hazards and the mechanisms which can initiate them, risk estimation (likelihood of occurrence), and risk evaluation (overall risk based on likelihood and consequences). The risk estimation provides a measure of the risk reduction required to reduce the risk to a tolerable level.

Hazard identification results in the identification of safety functions which are required to control the risk.

The safety functions may then be allocated to a number of different systems including E/E/PES, other technology and external measures.

For each system providing a safety function, a failure rate measure can be assigned which in turn determines the integrity required of the system. alternatively, a qualitative approach (based on the likelihood and consequence of the hazard, and the frequency and level of exposure and avoidability) may be used to define the required integrity.

Safety Integrity Levels

IEC 61508 assigns four software and hardware safety integrity levels (SILs) to required measures of risk reduction. Guidance is then provided on the system configuration, level of subsystem fault tolerance and diagnostic coverage, and safety life-cycle measures required to achieve the designated hardware SIL, and the software methods and life-cycle measures required to achieve the designated software SIL. It also provides guidance on qualitative methods for establishing the SIL level required. Part 2 of the standard places architectural constraints on the hardware configuration by setting minimum fault tolerance and diagnostic coverage requirements for each element or subsystem. It should be noted that IEC 61508 limits the risk reductions which can be claimed for a safety related E/E/PES which operated in low demand mode or continuous mode to no better than 10-6 and 10-9 respectively for SIL4.

The requirement is more demanding for subsystems which do not have well defined behaviour modes or behaviour (e.g. programmable systems). The standard requires that a reliability model of the system architecture be created and the reliability predicted and compared with the target safety integrity level to confirm that the required risk reduction has been achieved.

It is necessary to demonstrate that the required level of integrity has been achieved in the design, installation, operation and maintenance of the system.

It should be noted that the integrity of a safety related system is critically dependant upon the detection and correction of dangerous failures. Where there is a low level of diagnostic coverage, as is usually the case with lower integrity systems, then the integrity is critically dependent upon the proof test interval. Where there is a high level of diagnostic coverage to automatically reveal failures on-line, for example for high demand high integrity systems, then the integrity is also heavily dependant upon the frequency of diagnostic checks, and the mean time to repair the equipment, which includes the diagnostic test interval.

SIL levels are now being quoted for proprietary subsystems (and certified by test bodies). Quoted SILs should be associated with proof test intervals, diagnostic coverage and fault tolerance criteria. They are useful for evaluation of architectural constraints, but do not eliminate the requirement to confirm that the requires safety integrity level for the safety function provided by the system has been achieved. Software includes high level user application programmes and parameter settings.

Alarm systems

Alarm systems alert operators to plant conditions, such as deviation from normal operating limits and to abnormal events, which require timely action or assessment.

Alarm systems are not normally safety related, but do have a role in enabling operators to reduce the demand on the safety related systems, thus improving overall plant safety.

However, where a risk reduction of better than 10-1 failures on demand is claimed then the alarm system, including the operator, is a safety related system which requires a suitable safety integrity level (SIL 1 or SIL 2 as defined by BS IEC61508).

EEMUA 191 'Alarm systems - a guide to design, management and procurement' considers alarm settings, the human interface (alarm presentation), alarm processing and system management controls for both safety related and other alarm systems. It provides the following guidance in regard to safety related alarm systems:

  • The alarm system should be designed in accordance with IEC 61508 to SIL 1 or 2, with the designated reliability;
  • The alarm system should be independent from the process control system and other alarms unless it has also been designated safety related;
  • The operator should have a clear written alarm response procedure for each alarm which his simple, obvious and invariant, and in which he is trained;
  • The alarms should be presented in an obvious manner, distinguishable from other alarms, have the highest priority, and remain on view at all times when it is active;
  • The claimed operator workload and performance should be stated and verified.

Alarms which are not designated as safety should be carefully designed to ensure that they fulfil their role in reducing demands on safety related systems.

For all alarms, regardless of their safety designation, attention is required to ensure that under abnormal condition such as severe disturbance, onset of hazard, or emergency situations, the alarm system is remains effective given the limitations of human response. The extent to which the alarm system survives common cause failures, such as a power loss, should also be adequately defined.

Further guidance is available in EEMUA 191 'Alarm systems - a guide to design, management and procurement', and CHID circular CC/Tech/Safety/9.

Alarm settings

The type of alarm and its setting should be established so as to enable the operator to make the necessary assessment and take the required timely action. Settings should be documented and controlled in accordance with the alarm system management controls.

Human interface (alarm presentation)

The human interface should be suitable. Alarms may be presented either on annunciator panel, individual indicators, VDU screen, or programmable display device.

Alarms lists should be carefully designed to ensure that high priority alarms are readily identified, that low priority alarms are not overlooked, and that the list remains readable even during times of high alarm activity or with repeat alarms.

Alarms should be prioritised in terms of which alarms require the most urgent operator attention.

Alarms should be presented within the operators field of view, and use consistent presentation style (colour, flash rate, naming convention).

Each alarm should provide sufficient operator information for the alarm condition, plant affected, action required, alarm priority, time of alarm and alarm status to be readily identified.

The visual display device may be augmented by audible warnings which should at a level considerably higher than the ambient noise at the signal frequency. Where there are multiple audible warnings, they should be designed so that they are readily distinguished from each other and from emergency alarm systems. They should be designed to avoid distraction of the operator in high operator workload situations. Where both constant frequency and variable frequency (including pulsed or intermittent) signals are used, then the later should denote a higher level of danger or a more urgent need for intervention.

Alarm processing

The alarms should be processed in such a manner as to avoid operator overload at all times (alarm floods). The alarm processing should ensure that fleeting or repeating alarms do not result in operator overload even under the most severe conditions. A number of alarm processing techniques include filtering, deadband, debounce timers, and shelving, are described in EEMUA 191 'Alarm systems - a guide to design, management and procurement'.

The presentation of alarms should not exceed that which the operator is capable of acting upon, or alternatively the alarms should be prioritised and presented in such a way that the operator may deal with the most important alarms without distraction of the others. Applicable alarm processing techniques include grouping and first-up alarms, eclipsing of lower grade alarms (e.g. suppression high alarm when the high-high activates) suppression of out of service plant alarms, suppression of selected alarms during certain operating modes, automatic alarm load shedding and shelving.

Care should be taken in the use of shelving or suppression to ensure that controls exist to ensure that alarms are returned to an active state when they are relevant to plant operation.

Alarm system management procedures

Management systems should be in place to ensure that the alarm system is operated, maintained and modified in a controlled manner. Alarm response procedures should be available, and alarm parameters should be documented.

The performance of the alarms system should be assessed and monitored to ensure that it is effective during normal and abnormal plant conditions. The monitoring should include evaluation of the alarm presentation rate, operator acceptance and response times, operator workload, standing alarm count and duration, repeat or nuisance alarms, and operator views of operability of the system. Monitoring may be achieved by regular and systematic auditing.

Matters which are not worthy of operator attention should not be alarmed.

Logging may be a suitable alternative for engineering or discrepancy events to prevent unnecessary standing alarms. A system for assessing the significance of such logged events to ensure timely intervention by maintenance personnel may be required.

Protection systems (Trips and Interlocks)

Protective tripping systems provide a defence against excursions beyond the safe operating limits by detecting a excursions beyond set points related to the safe operating limits (ie the onset of a hazard) and taking timely action to maintain or restore the equipment under control to a safe state. Trips should not be self resetting unless adequate justification has been made. Protective interlocks prevent those control actions which might initiate a hazard from being undertaken by an operator or process control system, and are by nature self-resetting.

Protection systems should indicate that a demand to perform a safety function has been made and that the necessary actions have been performed.

Independence

Protective systems should be sufficiently independent of the control system or other protective systems (electrical/electronic or programmable). Where there is an interface between systems (e.g. for indication, monitoring or shared components) or shared utilities (e.g. power), environment (e.g. accommodation, wiring routes) or management systems (maintenance procedures, personnel), then the method of achieving independence should be defined, and common cause failures adequately considered.

Measures to defend against common mode failures due to environmental interactions may include physical separation or segregation of system elements (sensors, wiring, logic, actuators or utilities) of different protective systems.

Independence will also be required for protection against systematic and common mode faults. Measures may include use of diverse technology for different protective systems. Where more than one E/E/PES protective system is used to provide the required risk reduction for a safety function, then adequate independence should be achieved by diverse technology, construction, manufacturer or software as necessary to achieve the requires safety integrity level.

Dependence on utilities

The action required from the protective system depend upon the nature of the process. The actions may be passive in nature, such as simple isolation of plant or removal of power, or they may be active in that continued or positive action is required to maintain or restore a safe state, for example by injection of inhibitor into the process, or provision of emergency cooling.

Active protective measures have a high dependence upon utilities, and may be particularly vulnerable to common mode failures. The scope of the protective system therefore includes all utilities upon which it depends, and they should have an integrity consistent and contributory to that of the remainder of the system.

Measures taken to defend against common mode failure of utilities will be commensurate with the level of safety integrity required, but may include standby or uninterruptable/reservoir supplies for electricity, air, cooling water, or other utilities essential for performance of the safety function. Such measures should themselves be of sufficient integrity.

Survivability and external influences

The protective system should be adequately protected against environmental influences, the effects of the hazard against which it is protecting, and other hazards which may be present. Environmental influences include power system failure or characteristics, lightning (BS 6651), electromagnetic radiation (EMR) (BS 6667, IEC 61000), flammable atmospheres (BS 5345, BS EN 60079, BS 6467, BS 7535, BS EN 50281), corrosive or humid atmospheres, ingress of water or dust (BS EN 60529), temperature, rodent attack, chemical attack, vibration physical impact, and other plant hazards.

Degradation of protection against environmental influences during maintenance and testing should have been considered and appropriate measures taken. e.g. Use of radios by maintenance personnel may be prohibited during testing of a protective system with the cabinet door open where the cabinet provides protection against EMR.

Protection against random hardware faults

The architecture of the protective system should be designed to protect against random hardware failure. It should be demonstrated that the required reliability has been achieved commensurate with the require integrity level. Defensive measures may include high reliability elements, automatic diagnostic features to reveal faults, and redundancy of elements (e.g. 2 out of 3 voting for sensors) to provide fault tolerance.

Protection common mode failures

Diversity of elements is not effective for protection against random hardware faults, but is useful in defence against common mode failures within a protective system.

Protection systematic failures

Protection against systematic hardware and software failures may be achieved by appropriate safety lifecycles (see IEC 61508, Out of Control).

Sensing

Sensors include their connection to the process, both of which should be adequately reliable. A measure of their reliability is used in confirming the integrity level of the protective system. This measure should take into account the proportion of failures of the sensor and its process connection which are failures to danger.

Dangerous failures can be minimised by a number of measures such as:

  • Use of measurement which is as direct as possible, (e.g. pneumercators provide an inferred level measurement but actually measure back pressure against a head and are sensitive to changes in density due to temperature variations within the process, and to balance gas flow, upon which they are dependant);
  • Control of isolation or bleed valves to prevent uncoupling from the process between proof tests or monitoring such that their operation causes a trip;
  • Use of good engineering practice and well proven techniques for process connections and sample lines to prevent blockage, hydraulic locking, sensing delays etc.;
  • Use of analogue devices (transmitters) rather than digital (switches);
  • Use of positively actuated switches operating in a positive mode together with idle current (de-energise to trip);
  • Appropriate measures to protect against the effects of the process on the process connection or sensor, such as vibration, corrosion, and erosion;
  • Monitoring of protective system process variable measurement (PV) and comparison against the equivalent control system PV either by the operator or the control system.

Guidance on process connection is provided in BS 6739 British Standard Code of practice for instrumentation in process control systems: Installation design and practice.

Proof testing procedures should clearly set out how sensors are reinstated and how such reinstatement is verified after proof testing.

Maintenance procedures should define how sensors/transmitters are calibrated with traceability back to national reference standards by use of calibrated test equipment.

Other matters which will need to have been considered are:

  • Cross sensitivities of analysers to other fluids which might be present in the process;
  • Reliability of sampling systems;
  • Protection against systematic failures on programmable sensors/analysers. The measures taken will depend on the level of variability and track record of the software. 'Smart' transmitters with limited variability software which are extensively proven in use may require no additional measures other than those related to control of operation, maintenance, and modification, whereas bespoke software for an on-line analyser may require a defence in depth against systematic failures (BS IEC 61508 Part 3);
  • Signal conditioning (e.g. filtering) and which may affect the sensor response times;
  • Degradation of measurement signals (distance between sensor and transmitter may be important);
  • Accuracy, repeatability, hysteresis and common mode effects (e.g. effects of gauge pressure or temperature on differential pressure measurement);
  • Integrity of process connections and sensors for containment (sample or impulse lines, instrument pockets are often a weak link in process containment measures).

Use of 'SMART' instruments requires adequate diagnostic coverage and fault tolerance (see architectural constraints in IEC 61508 Part 2), and measures to protect against systematic failures (software design/integration, inadvertent re-ranging during maintenance). Measures may include use of equipment in non-smart mode (analogue signal output, no remote setting) and equipment of stable design for which there is an extensive record of reliability under similar circumstances.

Actuators and signal conversion

Actuators are the final control elements or systems and include contactors and the electrical apparatus under control, valves (control and isolation), including pilots valves, valve actuators and positioners, power supplies and utilities which are required for the actuator to perform its safety function, all of which should be adequately reliable. A measure of their reliability is used in confirming the integrity level of the protective system. This measure should take into account the proportion of failures of the actuator under the relevant process conditions which are failures to danger.

Actuators are frequently the most unreliable part of the tripping process.

Dangerous failures can be minimised by a number of measures such as:

  • Use of 'fail-safe' principles so that the actuator takes up the tripped state on loss of signal or power (electricity, air etc.). e.g. held open, spring return actuator;
  • Provision of uninterruptable or reservoir supplies of sufficient capacity for essential power;
  • Failure detection and performance monitoring (end of travel switches, time to operate, brake performance, shaft speed, torque etc.) during operation;
  • Actuator exercising or partial stroke shutoff simulation during normal operation to reveal failures or degradation in performance. Note this is not proof testing but may reduce probability of failure by improved diagnostic coverage (IEC 61508);
  • Overrating of equipment.

Other matters which should have been considered are:

  • Valves should be properly selected for their duty, and it should not be assumed that a control valve can satisfactorily perform isolation functions;
  • Actuators may also include programmable control elements (e.g. SMART instruments) particularly within positioners and variable speed drives and motor control centres. Modern motor control centres may use programmable digital addressing. This introduces a significant risk of introduction of systematic failure and failure modes which cannot be readily predicted. Such an arrangement should be treated with caution. It is normally reasonably practicable for trip signal to act directly upon the final contactor;
  • Potential for failure due to hydraulic locking between valves (e.g. trace heated lines between redundant shutoff valves).

Logic systems

Commonly, the logic systems for protective systems are electronic, but programmable and other technology systems (magnetic or fluidic/pneumatic) have been used.

The architecture of the logic system will be determined by the hardware fault tolerance requirements, for example dual redundant channels. Where a high level of integrity for the system is required (SIL3 or SIL4) then diverse hardware between channels may be employed. This should not be confused with diversity of independent protective systems.

Logic systems are likely to incorporate provisions for fault alarms and overrides, for which there should be suitable management control arrangements. They may also provide monitoring of input and output signal lines for detection of wiring (open circuit, short circuit) and sensors/actuators (stuck-at, out of range). Such monitoring may initiate an alarm, a trip action or, in a voting arrangement, disable the faulty element.

Software based systems should be adequately protected against systematic failures, for example by an appropriate hardware and software safety lifecycles, and suitable techniques and quality systems. Guidance is available in BS IEC 61508 Part 3, PES Parts 1 & 2, EEMUA 160, Out of Control, and IGasE SR15 - Programmable equipment in safety related applications.

Wiring and communications (signal transmission)

Transmitters, communications devices and wiring systems should be arranged to meet the requirements for survivability, protection against external influences and independence.

Independent systems or redundant channels should not share multicore cables with each other or power circuits, and may require diverse routes depending upon the safety integrity level to be achieved.

Measures to protect against failures include:

  • Use of fail-safe principles such as DC model (e.g. 4-20 ma loop) for analogue signal transmission diagnosis and alarm of out of range, abnormal, or fault states (such as stuck-at) with defined control system responses for both the sensor and transmitter;
  • Cable selection (screening etc.);
  • Protection of cables against fire, chemical attack, physical damage etc.;
  • Physical separation or segregation of cables and cable routes;
  • Routing in benign environments;
  • Use of optical fibres to protect against electrical interference;
  • Careful attention to lightning protection (BS 6651) of data links between buildings.

Use of fieldbus or other digital communication protocols in protective systems should be considered a novel approach requiring a thorough evaluation and demonstration of the safety integrity. EEMUA 189 'A guide to fieldbus applications in the process industry' provides limited guidance.

Utilities

Utilities which are required for the protective system to perform its safety function may include power supplies such as electricity, air, inhibitor materials and their propellants, inert gas such as nitrogen, cooling water, steam, pilot flames and their gases all of which should be adequately reliable. Measures such as redundancy, and uninterruptable/reservoir supplies, and availability monitoring (e.g. loss of air alarm) may be required. Confirmation that the designed capacity of reserves is adequate should be demonstrated by test.

Utilities may also introduce external influences into the protective systems (e.g. from electrical supplies) .

Measures to protect against external influences may include:

  • Under/Over voltage protection;
  • Overcurrent and short circuit protection;
  • Use of an uninterruptable power supply or voltage conditioning or filtering;
  • Careful attention to lightning protection (BS 6651) and equipotential bonding (BS 7671).

Proof testing

The probability of failure on demand, or the failure rate of a protective system is critically dependent upon the frequency of proof testing and its ability to detect previously unrevealed failures of the system. The proof test interval should therefore be established accordingly, and as a rule of thumb for low demand systems, should be an order of magnitude less than the mean time between failure of the system and the demand rate.

Proof test procedures should be available which specify the success/failure criteria and detail how the test will be performed safely, including any management arrangements, operating restrictions and competence of personnel.

The tests should be arranged to reveal all dangerous failures which have been unrevealed in normal operation including the following measures:

  • Tests performed at the conditions which would be expected at trip. (Where test under trip conditions cannot be performed, for example for safety reasons, then measures to ensure that potential failures at trip conditions will be revealed should be clarified);
  • End to end tests at appropriate intervals, including proving sample/impulse lines. (Different elements of the protective system may require proof testing at different intervals).

Operation

Procedures should be available which detail the operation of the protective system including:

  • Override management (authorisation, security, recording, monitoring and review of overrides, reset requirements);
  • Operating instruction for trips;
  • Instructions for response to equipment faults including fault alarms. (There should be procedural arrangements in place to ensure timely repair so that mean time to repair criteria can be met).

Maintenance

Procedures should be available for maintenance activities including:

  • Maintenance instructions;
  • Control of spares (segregation of faulty or non-conforming parts, identification to prevent interchange of similar parts etc.);
  • Competence of maintenance personnel;
  • Operating restriction during maintenance;
  • Control of software back-ups and memory media (E/EPROMS, floppy disks, files on hard disks on portable PCs etc.);
  • Post maintenance reinstatement and proof testing.

For systems where a high diagnostic coverage is claimed, for example high integrity high systems, the probability of failure (expressed as failure rate) is critically dependant upon the mean time to repair the faults revealed. For such systems, the repair performance should monitored and reviewed against the design criteria.

Modification

A management system for control of modifications should be available to ensure that:

  • Unauthorised modifications are prevented;
  • Authorised modifications are not ill conceived;
  • Safety verification to confirm that the required safety function and integrity have been maintained;
  • Designed and implementation is carried out by competent persons.

Remote diagnostic systems

Remote diagnostic systems have the potential to cause danger by initiating unexpected operations or by affecting safety functions by software/parameter modification or by diverting the control system processor from time critical functions.

The need for remote diagnosis should be justified, a risk assessment completed, and measures taken to ensure that safety is not affected by normal operation or malfunction of the diagnostic system, including the remote diagnostic terminal and software, communication link, and the control system diagnostic interface and software.

Consideration should be given to:

  • Security and control of access;
  • Communication between diagnostician and plant personnel;
  • Restricted mode of operation; passive (monitoring only), active (control/operator functions), interactive (software change possible);
  • Potential for operation outside restricted mode under fault conditions;
  • Protection of safety functions from unauthorised modification;
  • Change control;
  • Competence of personnel.

Whilst beyond the scope of HS(G)87 'Safety in the remote diagnosis of manufacturing plant and equipment', the publication provides a useful background to the subject.

Process control systems

Process control systems are primarily implemented for economic reasons. However, those which are not considered safety related should still be designed, installed, operated and maintained so that their failure does not place a rate demand in the protective system which was not anticipated in its design. Part 1 of BS IEC 61508 provides guidance. The dangerous failure modes of the control system should be determined and taken into account in overall safety system specification. The control system should also be sufficiently independent of the safety systems.

The control system may provide steady state or change of state (start-up, shutdown, batch) control functions. The latter may be implemented by automatic sequences or procedurally under manual control. Control systems should be implemented to provide stable control of the process under all expected normal and upset circumstances, including start-up and shutdown.

The system should be designed to prevent or verify operator commands which might place a demand upon the protective system.

The dangerous failure rate of the control system should be supported by operational experience of the system in a similar application, reliability analysis or reliability data from industry databases. The failure rate that may be claimed may not be less than 10-5 dangerous failures/hour.

Consideration should be given to failure behaviour so as to minimise the demands placed on the protective systems such as under the following circumstances:

  • I/O power failure;
  • Main power failure;
  • I/O faults (open/short circuit, out of range);
  • Module/processor failure (I/O, controller, cell, supervisory);
  • Communications failure (at all levels of the architecture).

Consideration should also be given to change control and software back-up systems. As the control system provides control, monitoring and logging functions which significantly aid the operator, consideration should be given to survival of the control system during hazardous events and emergency response.

It should be noted that redundant (non-diverse), cross monitored control processors are extremely vulnerable to common mode failure.

It should be demonstrated that the process control system does not exercise safety functions during sequences and changes of state under its control. For example, where the control system batch sequence controls the mixing of quantities of materials or reagents which, if incorrect quantities are admitted, may result in an unintended reaction, then measures of sufficient safety integrity, other than the control system, should be taken to ensure that the residual risk is as low as reasonable practicable.

For the purposes of risk evaluation, failure of the control system (at not less than 10-5 failures/hour or 10-1 failures on demand) should be considered as part of the hazard initiation sequence rather than a risk reduction measure.

Exothermic reactions

Exothermic reactions are particularly demanding in terms of control and protection as they tend to be unstable with aggressive reaction kinetics, and may require risk reduction measures which are required continuously throughout the reaction stage and which rely on utilities such as cooling systems, agitation, inhibitor injection etc.

Thus, loss of any single utility may be a dangerous failure, and initiate a hazard (e.g. loss of agitator blades, and hence reduced cooling because of poorer heat transfer, giving rise to a runaway reaction).

The components of the utilities should be considered safety related and provide adequate protection against failure including common mode failures (e.g. loss of electricity) and systematic failures (e.g. failure to fill inhibitor stock vessel). Sufficient diagnostics should be provided to reveal such failures so that timely automatic or manual response can be initiated.

Diagnostics should be designed to reveal the failure as directly as possible, for example:

  • Agitator torque rather than shaft speed (which will not reveal blade loss);
  • Cooling water flow rather than pump stopped.

Their capacity and capability to deal with the most extreme reaction kinetics (e.g. worst case mixtures) and limiting conditions (e.g. maximum temperature/pressure achievable under worst case) should also be demonstrated.

Expert systems

Expert systems are normally employed as operator support tools. Use of an expert or other deductive or learning system for direct process control should be considered novel and adequate assessment of the risks provided (see OM 1996/117).

References

Is this page useful?

2020-07-31