T/AST/003 - Issue 6
1.1 Safety Systems represent a central pillar of the 'Defence in Depth' safety philosophy that is insisted upon in UK nuclear plants. The main aim of this philosophy is to avoid situations where an initiating fault can lead directly to an accident with nothing able to prevent it. Although faults cannot be prevented, provisions (engineered systems and/or procedures) can be deliberately put in place to recognise and respond to faults to prevent and/or mitigate the accident that would otherwise ensue (i.e. they provide protection against those faults). Such provisions are known as Safety Systems (SSs).
1.2 The aim of this guide is to interpret and amplify the Safety Assessment Principles (SAPs)1 in relation to Safety Systems, in order to advise and inform ONR inspectors in the exercise of their professional regulatory judgement concerning Safety System need and adequacy. As for all guidance, inspectors should use their judgement and discretion in the depth and scope to which they apply it.
2.1 This guide elaborates on relevant SAPs where they are not self evident. References to Safety Systems are scattered throughout the SAPs, so Safety Systems are addressed indirectly in many locations as well as specifically in their own section of the SAPs (paras 336 to 362 inclusive, covering SAPs ESS1 to ESS27).
2.2 Much of the advice contained herein is also reflected in Ref 9 - BS IEC 61508 - Functional Safety of Electrical/Electronic/Programmable Electronic Safety Related Systems. The scope of that standard is wider than this guide, covering in detail all lifecycle aspects (safety management, planning, risk and hazard analysis, validation, commissioning, decommissioning etc) and includes both Safety Systems and Safety Related Systems, although they are both referred to as Safety Related Systems in the standard. Specific references to this standard are not made in the text as they would be too numerous.
2.3 The IAEA Safety Guide that is most relevant is Ref 12 - NS-G-1.3 - Instrumentation and Control Systems Important to Safety in Nuclear Power Plants. This is a detailed guide that is provided for system design purposes, including both safety systems and safety related systems, and incorporates comprehensive advice. Cross references to particular sections of the IAEA guide are provided throughout this guide.
2.4 Explicit linkages between relevant sections of this guide and related WENRA Reactor Reference Levels are tabulated in Appendix 4. Other WENRA Reference Levels are not related to the topics in this guide.
3.1 Licence conditions 14 and 15 (preparation and review of safety cases) apply particularly, and also of relevance are LCs 23 (limits and conditions in the interests of safety), 24 (operating instructions), 27 (safety mechanisms, devices and circuits) and 28 (examination, inspection, maintenance and testing).
1) 'Safety System' is an IAEA term comprising 'Protection Systems', 'Safety Actuation Systems', and 'Safety System Support Features', and together with 'Safety Related Systems' makes up 'Items Important to Safety' [Ref 12 sections 2.18 et seq. & Fig 1]. The SAPs' definition remains the same as in the 1992 edition of the SAPs, and is 'A system which acts in response to a fault to prevent or mitigate a radiological consequence'. The IAEA definition is similar, but restricted to reactors and aimed at safe shutdown, residual core heat removal, and limiting consequences of anticipated operational occurrences and design basis accidents. The SAPs' definition is more appropriate for use in UK nuclear facility regulation, since such facilities are not restricted to reactors. There are a number of points to consider which are either implied by or can be deduced from the SAPs' definition:


2) Encompassed within the term 'safety system' are
1) In identifying the need for SSs, the approach that will be described herein is to determine the potential harm from each individual fault sequence in terms of radiological consequence both on and off site. Reference to other assessment specialisms will be necessary to confirm or otherwise the accuracy of consequence claims in the safety case. For these purposes the effects of any SS or beneficial SRS should be ignored. The presence of substantial items such as passive shielding may however be taken into account providing their presence is guaranteed, their integrity is shown to be invulnerable to the fault sequence under consideration, and they achieve their safety function simply by being present. This process leads to the concept of the 'unprotected plant', which is the starting point for the analysis.

2) For each fault sequence the frequency of each initiating fault (IF) should be determined ignoring any SS or beneficial SRS. Again reference to other specialisms may be needed to verify the accuracy of safety case claims.
3) If a value less than 1E-7/yr is obtained for the demand frequency, then subject to satisfying the ALARP principle no special SS is required (but see the note below relating to large releases). This figure is set at 10% of the 'broadly acceptable' risk for offsite consequences of >1Sv (see SAP Target 8) on the basis that a single class of accident should not make a disproportionate contribution to the overall risk (i.e. of the order of one tenth of the frequency in each dose band) - see SAP para 618.
NOTE: SAP Target 9 gives a 'broadly acceptable' risk for large release accidents (>=100 fatalities) of 1E-7/yr. Hence for such accidents, again applying the 10% principle in SAP para 618, the limiting frequency for a single class of accident should be 1E-8/yr.
4) If a value above 1E-5/yr is obtained for an internal demand frequency, or above 1E-4/yr if the demand is due to an external event (see SAP para 514), then, depending upon the potential consequences, the fault sequence might lie within the design basis. In this case, in addition to the other requirements set out below, the safety systems are expected to meet the single failure criterion (SAP EDR4 - following para 174) [Ref 12 4.15 et seq.], and should be analysed in accordance with the design basis analysis SAPs (paras 512 - 526). Design Basis Analysis (DBA) should be regarded as a means of focusing attention on potentially significant faults in order to allow demonstration of design robustness and fault tolerance. Failure to comply with the DBA principles does not necessarily imply unacceptability therefore. However in cases of non-compliance the licensee should provide a robust justification for the particular fault sequence, and show that all reasonably practicable steps have been taken to avoid unacceptable consequences. DBA and Probabilistic Safety Analysis (PSA) are both essential and complementary. DBA establishes, by conservative analysis, robustness of defence for significant faults that can reasonably be expected to occur during the lifetime of the plant; and PSA establishes, by best-estimate analysis, a comprehensive risk profile for the plant as a whole with respect to predicted behaviours of the plant, its systems, and operators.
5) If the unprotected risk from an individual fault sequence is unacceptable, whether or not it lies within the Design Basis, then SSs should be provided as indicated below to satisfy the ALARP principle and:
An additional need is for the sum of all fault sequence frequencies within each dose band to meet the summed frequency requirements of SAP Targets 5 and 8. Even if each fault sequence individually meets the above single accident conditions, the overall plant risk might still be too high, in which case additional SSs are required to reduce it.
6) The target reliability (in terms of failures per demand - fpd) for the SSs for each fault sequence will emerge from application of the guidance in para 4.2 5) above. Depending on the value, and whether or not the fault lies within the Design Basis, the following general qualities should be sought and confirmed as adequate by an appropriate form of analysis. The form and extent of analysis depends heavily on the degree of pessimism that is allowed in the allocation of fpd. If generous and evident pessimism is applied then no more than knowledgeable inspection should be necessary, but if little or no pessimism is allowed then a recognised formal technique such as FMEA should be applied and independently checked. Although fixed points are specified below a sliding scale is intended, with variations allowable based on appropriate justification:

7) Where SSs provide mitigation of the consequences (e.g. evacuation, filtration etc.) rather than prevention, the fault sequence (or its bounding equivalent) should be considered in two (or more for more than one mitigating SS - see note below) parts. Firstly a high consequence sequence ignoring mitigation, where the mitigator is considered in the same way as the other SSs in the subsequent evaluation; and secondly as a low consequence sequence (with the same demand frequency) where mitigation is assumed successful and not considered further, but the other SSs are evaluated in relation to the lower consequence. The reason for these separate sequences is that they give rise to different consequences, and different consequence bands have their own criteria to meet.
NOTE One mitigator requires two fault trees, two mitigators require four fault trees (highest consequence with both subject to failure, two lower sets of consequences, one for each of the mitigators subject to failure separately, and the lowest consequence with neither subject to failure). In general for n mitigators there will be 2n fault trees, although seldom are more than two present in practice. Representation of these multiple mitigator situations is generally clearer if event trees are used.
1) This section discusses a number of specific aspects of SSs and their configuration with guidance about the credit that should or should not be permitted in each case with the associated reasoning.
2) Safety Schedule
3) Requirements Specification correctness
A common and easily overlooked source of error is the requirements specification for the system, especially for complex systems where the commissioning procedures cannot be relied upon fully to test all aspects of system behaviour. Particular attention should therefore be devoted to seeking a justification from the licensee for the correctness of the specification for such systems, to give confidence in their ability to deliver the required functionality.
4) Configuration of SSs

5) Features of individual SSs
6) Complexity
In general the level of dependence on SSs incorporating complex technology should be limited to the order of 1E-1 failures per demand (interpreted herein as 0.3), unless a sufficiently robust justification can demonstrate the appropriateness of a lower value. See also T/AST/046 8 and Appendix 1.
7) Diagnostics (self-testing)
As systems increase in complexity, especially if they employ software, they generally incorporate a measure of diagnostic capability. The aim here is entirely desirable, i.e. revealing faults in the hardware or in system behaviour to allow action to be taken to prevent hazardous consequences. However there can be undesirable effects in that the level of complexity is increased. The aim should always be that the elements carrying out the diagnostic function, and the diagnostic function itself, should not be able to interfere adversely with the safety function. Provided this criterion is met then the presence of self-testing is beneficial. However if it is not met, for example in systems with embedded software where the processor that implements the safety function also implements the self-tests, then there is a significant danger that the self-test functions can interfere with the safety function. In such cases it is not appropriate to assume that the benefit to safety of self-testing outweighs the disbenefit to safety of increased complexity, and if such a device is to be used then its safety analysis needs to encompass the self-testing software.
In addition, the extent and coverage of diagnostics can be difficult to determine and this may lead to over-optimistic claims of the failures which can be revealed by them (e.g. in terms of the revealed failure rate or detection of anomalous system behaviour). To avoid over-reliance on diagnostics a sensitivity study should be carried out and a conservative claim on their effectiveness demonstrated. It must also be remembered that the diagnostic capability itself must be subject to testing and it may be difficult to demonstrate 100% effectiveness of this test where the diagnostic capability is implemented within the same equipment that it is claimed to be testing.
8) Configuration Management and change controls
Provision should be made for controlling changes throughout the life of the SS in a manner that preserves its integrity. It should be recognised that the change process is itself a significant potential degradation mechanism for the SS, and the integrity of the SS depends heavily on the integrity of this process in terms of the quality of the controls that are applied. Aspects such as configuration management (version control) and impact analysis should receive particularly close attention.
9) Independent assessment
Evidence of independent assessment should be provided for all SSs, the degree of rigour and independence related to the level of safety dependence upon the specific SS.
10) Calculation of summed risk
When all individual fault sequences have been quantified, the frequencies, including those of 1E-7 and below, in each off and on site consequence band (SAP Targets 6 and 8), should be summed and compared with SAP Targets 5 and 8. Significant disparities will require special consideration and possible correction.
11) Operational aspects
1) High reliability or low unreliability is linked to the Class of a SS. Generally achieving high reliability or low unreliability requires considerable attention to detail at all stages of a SS lifecycle. With this in mind the following tables show the link between the Class of the system and a range of probability of failure-on-demand (pdf) for demand based SSs or SRSs (for nuclear installations the majority of SSs are demand based), frequency-of-failure (ff, dangerous failure frequency for high demand or continuous acting SSs or SRSs).
| System Class | Probability of failure on demand (pfd) |
|---|---|
| Class 1 | 10-3 ≥ pfd ≥ 10-5 |
| Class 2 | 10-2 ≥ pfd > 10-3 |
| Class 3 | 10-1 ≥ pfd > 10-2 |
| System Class | Failure Frequency/yr (ff) |
|---|---|
| Class 1 | 10-3/yr ≥ ff ≥ 10-5/yr |
| Class 2 | 10-2/yr ≥ ff > 10-3/yr |
| Class 3 | 10-1/yr ≥ ff > 10-2/yr |
2) It should be noted that some analyses requires the continuous frequency to be in the form of a rate per hour and in this case a factor of 10000 is usually applied, so, for example, Class 1 would become – 10-7/hr ≥ ff ≥ 10-9/hr. It should also be noted that for complex computer based systems the table in Appendix 3 of T/AST/046 applies. All other safety systems technologies including modern complex programmable logic devices (CPLDs) the above tables apply
(See also T/AST/046 8 and Ref 12 4.35 and 5.43 et seq.)
A1.1 It is worth beginning with a source of argument that is believed to lie at the heart of many disputes between licensee and regulator. This is the vital but often misunderstood distinction between dependability and dependence. The dependability of a system is the degree to which it could be relied upon, whereas dependence on a system is the degree to which it is relied upon. These sound the same, but only become the same in the unusual circumstance that we are sure of the true reliability of the system. Whenever our knowledge falls short of this, which it always does to a greater or lesser extent, then we must ensure that we place less dependence upon the system than it is capable of bearing. In other words its dependability must always exceed the level of dependence placed upon it. The difference is the safety margin. The important point is that the more uncertain is the reliability of the particular system in question, then the greater the safety margin that is required. Such uncertainties abound where advanced systems are used, the greater the complexity and sophistication then the greater the uncertainty in reliability.
A1.2 This causes problems where the regulator insists on regarding a particular system as unreliable, in order to establish an appropriate working safety margin, whereas the licensee thinks that it represents the regulator's estimation of the true reliability for that type of system. Note that the use of the word 'system' here relates to the delivery of a single function, for example a single control sequence in a distributed control system. Hence the licensee will argue, quite rightly, that if such systems are as bad as that then they could never be relied upon for anything, and will consider the regulator's view wholly unjustifiable. The regulator's view is subtly different however, and can be summarised as - even if this particular system is as unreliable as this, which is unlikely but not inconceivable, then adequate safety can still be demonstrated. The regulator might add the observation that even though all such systems will not behave so unreliably on average, if the one system that is protecting the fault sequence in question is the one that is so unreliable (the rogue, so to speak), and a higher reliability is assumed, then in the event of an accident it will be of little comfort to point out that ninety-nine other similarly protected accidents might have happened but did not do so!
A1.3 However there still remains an apparent anomaly, in that it will be pointed out that the regulator does not always take so pessimistic a view, for example where a simple hard-wired system is used, even though any individual system might also be a rogue and soon fail. The distinction here is that the 'time-to-failure' distribution of the hard-wired system will be known to a higher level of confidence, so the probabilities of individual times to failure are more accurately known. Furthermore proof tests can be more easily shown to be comprehensive for a simple system, so a failure that does occur can be expected not to persist beyond the next test interval. For complex, especially software-based, systems, where systematic faults are much more likely, the 'time-to-failure' distribution is completely unknown, and the periodic proof tests are unable to reveal other than random faults arising from non-software sources. Hence the level of uncertainty is higher, the safety margin must be larger, and the level of dependence placed on the system should therefore be correspondingly less.
A1.4 There is clearly a potential conflict of interest between advanced technology with its predisposition towards complexity on the one hand and traditional engineering principles which require simplicity on the other. The source of the conflict and its effects are understandable, but the essential problem still remains. This represents a very important concern in modern systems and it is therefore worth devoting some thought to means by which the required elements of the traditional principles can be retained while still being able to profit from the benefits of advanced technology.
A1.5 If there are such means, then they must be arrived at from a knowledge of what it is that the traditional principles seek to achieve, from a knowledge of what advanced technology can offer, and by deliberate avoidance of degrading the one by the other. It would seem that there should be a solution, since there is nothing intrinsic in the principles that preclude advanced systems per se, although the one difficulty that is most apparent is the objective of avoiding complexity. Let us therefore explore this aspect in more detail.
A1.6 Consider a temperature trip system. The task to be achieved is simple, but if it is to be implemented using a microprocessor then it might appear that unnecessary complexity will be embodied. Considering first hardware aspects in isolation, it is true that a microprocessor is a complex device, so would the requirement for simplicity (see SAP ESS21) preclude it from such an application? To answer this we must consider the particular danger that is perceived in the SAPs by complexity. It is that with a complex system the level of understanding of behaviour is likely to be limited, so that whether or not adequate safety has been achieved might be obscure. Does a microprocessor's hardware complexity give rise to this fear? Not necessarily. Although very few people have an understanding of how a microprocessor performs its function, except in broad conceptual terms, the same can be said of conduction of electricity along a cable. Therefore it can be argued that we need not have complete understanding of the microprocessor hardware in order to have confidence in its capability, any more than we need to understand how a metal conducts electricity to be confident that it does. What is required is confidence, and that can be gained from sufficient experience of reliable behaviour. If a microprocessor is to be used in a trip system, then the level of confidence that is justifiable relates directly to the available experience of performance of the microprocessor in question. Hence we would have more confidence in one that had built a sound track record than one that had only recently been introduced. Care needs to be taken however to ensure that the microprocessor to be used in the safety system is the same as others that have built the track record. Manufacturers often change the physical construction and configuration of particular integrated circuits whilst still delivering the same functionality, so this possibility needs to be taken into account and a design sought that has been stable for some considerable time.
A1.7 Microprocessors, as other systems, suffer from both random and systematic hardware faults, but well established proprietary devices may have an adequate track record in this regard for our purposes for a single channel. Remember that we still need to incorporate redundancy and diversity where high integrity is required, and such features defend against these imperfections in microprocessors as they do in non-microprocessor systems. Redundancy defends against random but not systematic faults, and diversity defends against both.
A1.8 If we accept the above reasoning for the hardware, there is still however the software to be considered. Is this simple enough to be relied upon? Here a pertinent point is that unnecessary complexity in the implementation of a task should be avoided. In other words the inherent complexity of a task should not be increased by its implementation. For example a program to read an input port, compare the value with a set point in memory, and to set an output port according to the result can hardly be considered complex from an understanding point of view. Hence it is considered that there is no implicit reason for a microprocessor implementation of a temperature trip system to conflict with the SAPs' requirement for simplicity. We would add that if the task that is to be performed is itself inherently complex, then it is probably simpler to implement it using software than to attempt a purely hard-wired implementation, since methods are well established to develop software for complex applications whereas they are not so well established for non-software designs. We would add however that very few nuclear plant protection functions need have inherent complexity.
A1.9 It is probably worth clarifying this point further since it might appear to contradict earlier remarks about uncertainties in advanced systems. All understanding is hierarchical. We understand a complex thing in terms of the interaction of simpler things that we accept as already understood. In fact the simpler things are often not simple at all, but if we have sufficient prior experience of their dependability in delivering understood functions then they have our confidence, so for our purposes we are justified in regarding them as basic building blocks from which to build an understanding of that which we do not yet have experience of. Understanding, for our purposes, represents in fact a reasoned extrapolation from direct experience. This enables us to generate confidence in the correct functioning of the new system, either before it has built a track record, or even where so high a reliability is required that it can never build a track record. Thus the nature of the simplicity that is sought in safety systems is that which allows a ready understanding of those aspects that are new, i.e. of the functional design of the system. For those aspects that are not new, for which experience is available, the degree of dependence should be related to that experience. Hence, although we would be more justified in relying on a wire to conduct electricity than on a microprocessor to carry out a specific set of instructions, arguments based on past experience and simplicity might be successfully used to support a safety case.
A1.10 The temperature trip system described above might be viewed as unrealistic for a software implementation, in that a licensee is likely to seek to incorporate several trip functions within a single microprocessor system (or within a multiplexed communicating microprocessor system). Such incorporation of multiple functions however soon begins to threaten the level of understanding that is required for the necessary confidence. Furthermore it threatens the needed separation of safety systems from each other, and allows the potential for single faults to invalidate several different safety functions at the same time. Also, if, as is likely, common software is used in redundant safety systems, then software faults represent a source of common-cause failure of the redundant implementations of the same safety function. Hence this sort of arrangement conflicts with the need for simplicity in functional design, as well as with several other important safety principles, and for these reasons we would anticipate that the licensee would have a much greater level of difficulty in justifying its safety.
A1.11 It is worth saying something about self-testing however, since this is an eminently desirable feature, but one that again risks increasing the level of complexity. This is an area where constructive thought at the design stage can reduce problems later. Here we have two separate functions, which may or may not be executed by the same processor. The important thing is to engineer these functions so that they remain independent in their actions. Specifically the safety function (e.g. trip), must not be compromised by the secondary function (self-test). The object is to prevent the safety function from being degraded by the self-test program, however complex, by ensuring that control of the safety function never becomes subordinated to the self-test program.
A1.12 In general however we would expect the level of dependence on SSs incorporating complex technology to be limited to the order of 1E-1 failures per demand (interpreted herein as 0.3), unless a sufficiently robust justification either along the above lines or by application of the 'special case' procedure (SAP ESS27) can demonstrate the appropriateness of a lower value.
[Ref 12 2.21/22 and 5.36 et seq.]
A2.1 Introduction and Definitions
1) Functions referred to as: Safety Interlock, Permissive, Inhibit, Veto, Bypass and Override are often encountered when examining protection and control systems. The functions they represent are implemented as mechanical, pneumatic, electrical and electronic systems. The six terms are used to convey a similar meaning; all imply, 'the prevention, sometimes conditionally, of a course of action continuing or of an automatic system performing its intended function if called upon to do so'. The differences in the functions and their implementation have significant implications for safety. Unfortunately there is considerable inconsistency in the use of the terms. For example a veto on one plant might be referred to as a bypass or an override on another. For the purpose of this discussion the six terms and functions they represent are defined below. These definitions are provided only as an aid to understanding; they are not intended as formal definitions.
2) The functions defined above will be considered in three groups:
3) Interlocks and permissives are permanent features of conditional logic used to provide protection in the event that preceding actions or systems have failed, e.g. a shield door gamma interlock. The systems delivering these functions are classed as safety systems.
4) Inhibits, vetoes and bypasses are used to modify the protection available for operational convenience as:
These functions affect safety as they can remove or degrade protection and their uncontrolled use must be avoided.
5) Overrides suspend functions and equipment in an 'uncontrolled' manner and should not be available as part of safety systems.
A2.2 Assessment criteria
1) In assessing the safety significance of these differing kinds of function all aspects of their use must be considered; as their application can impact on safety in dissimilar ways. For example, a maintenance veto applied to remove a channel of plant protection from service may also remove the associated electrical supply. In this case the veto provides two somewhat conflicting functions of ensuring worker safety while degrading plant protection.
2) Override facilities are potentially very dangerous as control of their use is basically administrative and their safety is normally dependent on the operator correctly evaluating the situation. The failure of an operator to fully comprehend the current plant state could result in action that creates significant hazard for both the plant and personnel. Further, the change in plant operating regime may not be recognised by those at risk who may take an action and unintentionally exposes themselves to the 'new' hazard.
3) The following points should be considered during assessment of all systems:
4) Interlocks and permissives
5) Inhibit, veto and bypass
6) Overrides
A3.1 Introduction
1) For a reliable safety system, besides the avoidance of complexity, a fail-safe approach and the means of revealing faults from their times of occurrence should be applied. It is valuable for the assessor to have a clear understanding of what is meant by the phrases of 'fail safe approach' and 'means of revealing faults' before attempting to discern whether a safety system meets these principles.
A3.2 A fail safe approach
1) The term 'fail-safe' has been used in many engineering disciplines and industries to describe the way in which a system performs when it experiences failure. The term 'fail-safe' should not be equated with 'inherently safe' which implies that the system itself has qualities, or properties, that bestow its performance with safety. An inherently safe system would be one where it is not inherently capable of generating a significantly hazardous event i.e. it lacks the radioactive inventory, or the necessary release energy.
2) The concept of 'fail-safe' as applied in assessment of safety systems encompasses the expectation that when a system fails it would be to a safe state. This means that the failure modes are such that safety would not be prejudiced in the presence of the failure. It also means that careful system design is necessary to engineer that a safe outcome arises from failures. In practice it is very difficult indeed to ensure that all failure modes have a safe effect. The best that can normally be done is to ensure that the predominant failure modes, as well as loss of supplies or services, have safe effects. Note that 'predominant', as used here, does not necessarily mean that most of the failure modes have a safe effect, it means that the most likely failures have a safe effect
3) The 'fail-safe' property engineered into the design of a SS refers to the function that the system performs in relation to the particular hazard(s) which it is provided to protect against. Should a failure of the SS occur there should not be an increase in the plant risk in respect of that hazard. However, it should be recognised that the risk associated with other hazards, against which the SS has not been designed to protect, may be increased as a consequence of the failure.
4) For example the provision of a seat belt in a car has the safety function of restraining a passenger in the event of a collision. The seat belt release mechanism has the potential to fail open and the consequence in a collision would be to fail to restrain the passenger. To overcome this potential hazard the buckle is made so that its most likely failure modes cause the release mechanism to remain closed, so that for the hazard of a collision the belt fails safe. However, where escape from the car is necessary e.g. fire following a collision, the seatbelt's continuing to restrain the passenger is no longer safe, and therefore for this hazard the seatbelt does not fail safe.
5) Thus, a clear understanding of the safety function of the system is necessary together with knowledge of its failure modes and how they affect the safety function. Often the licensee submits a Failure Modes and Effects Analysis (FMEA) to substantiate the claims made for the reliability of a system and its fail safe design.
6) The assessor should be aware that a rule of thumb used by some licensees in claiming that a system is fail safe is that the safe failure rate should be at least ninety percent of the total failure rate. Assessors need to recognise however that the proportion of safe failures is not in itself a sound measure of adequacy. It may be that a design is adequate where the proportion of safe to total failure rate is considerably less than ninety percent, providing the overall dangerous failure rate (or fpd) is adequately low. Conversely, a design where the safe failure rate is well over ninety percent of the total may not be adequately safe where the overall dangerous failure rate (or fpd) is still too high.
7) Regardless of the licensee's approach fail safety should be considered by the assessor, especially for predominant failure modes and for loss of supplies and services, and a justification sought if such modes are not engineered to produce safe effects.
A3.3 Means of revealing a fault
1) Given that failures can either be safe or dangerous it is important in a reliable safety system that faults of any kind come to light as soon as possible after their occurrence. Thus not only are faults safe or dangerous but they will either be revealed or unrevealed. If a fault was to remain unrevealed for a period of time the plant could be operating in a degraded state of which the operator is unaware. A second fault might arise before the first fault had been revealed, causing a trip if the first fault was safe, or of more concern preventing a trip if the original fault was dangerous. In judging the adequacy of safety systems it is convenient to group the faults as shown below, so that where a design exhibits particular features the appropriate level of justification can be sought.
| Effect | Mode | |
|---|---|---|
| Revealed | Unrevealed | |
| Safe | Group I Failures in this group can be considered not to present a threat to the safety function. The failure either has no effect on the safety function or it generates a safety actuation signal. An example of a failure here is associated with the use of a live zero for current loop sensors. Should an open circuit occur the current drops below the live zero level revealing the failure, and initiating the safety function. Fail-safe faults are generally self revealing. |
Group II Failures in this group do not prevent a safety function from being carried out, but will become evident only when a specific test or operation necessary to reveal its presence is completed. The availability of the system may begin to be affected. An example here is failure of a switch used to select a veto. The failure in itself does not directly affect the safety function, but will only be revealed when the switch is tested, or the veto needs to be applied and does not work. It is assumed here that a safe state will remain if the veto is not applied. |
| Dangerous | Group III Failures in this group will partially or totally inhibit a safety function. A benefit is that operators are made aware of the presence of such a fault soon after its occurrence. A justification on whether the failure mode could be eliminated by redesign and the adequacy of the means of revealing the failure should be sought. A similar application of a current loop failure could be envisaged where no safe action is taken once failure has occurred. It would be reasonably practicable to redesign the equipment to make the failure mode fail safe. |
Group IV Failures in this group will partially or totally inhibit a safety function without providing any indication at the time that this has occurred. Such failures have to be revealed by deliberate measures to exercise the safety function periodically - i.e. proof testing. These failures are considered to be the greatest threat to the safety function. A single failure of this type in a design may be sufficient to exclude that design from being considered fail safe. An assessor may question whether the failure mode could be eliminated by redesign and why it is not possible to reveal its presence immediately. Such a failure would directly block the safety function while maintaining the appearance that the circuit is healthy. An example is self-oscillation in a pulse circuit where the presence of pulses is the healthy condition and the safety function is delivered by removing the pulses. The likelihood of such failures may render a design unacceptable. |
2) Thus the types of faults in a safety system have a strong impact on its adequacy. A safety system with only group I and II failure modes is likely to be acceptable (from a safety point of view, though a licensee may reject it from an availability point of view), whereas a safety system with too many group IV failure modes is likely not to be acceptable.
3) The method for revealing failure may well be the activation of the safety function; however this in itself may not be desirable. Usually, an alarm or signal is generated that is logged by other equipment monitoring the safety system. The integrity of the alarm system should be considered and it needs to be shown that an adequate level of isolation has been provided to prevent the alarm system from undermining the integrity of the safety system. Particular attention should be paid to alarms where there is a common element between alarms for redundant (or diverse) equipment– e.g. a common alarm annunciator in a control room for a multi-channel redundant system. The presence of such common elements (i.e. single point failures) can significantly worsen calculated random failure probabilities or frequencies for otherwise separate equipment.
4) Further points also need to be considered:-
Abbreviation: SS - Safety System
| WENRA Reactor Safety Reference Levels | T/AST/003: Safety Systems |
|---|---|
| Issue E - Design Basis Envelope for Existing Reactors | |
2.1 Defence-in-depth shall be applied in order to prevent, or if prevention fails, to mitigate harmful radioactive releases. The design shall therefore provide multiple physical barriers to the uncontrolled release of radioactive materials to the environment, and an adequate protection of the barriers. |
1. “Safety Systems represent a central pillar of the 'Defence in Depth' safety philosophy that is insisted upon in UK nuclear plants. The main aim of this philosophy is to avoid situations where an initiating fault can lead directly to an accident with nothing able to prevent it. Although faults cannot be prevented, provisions (engineered systems and/or procedures) can be deliberately put in place to recognise and respond to faults to prevent and/or mitigate the accident that would otherwise ensue (i.e. they provide protection against those faults). Such provisions are known as Safety Systems (SSs).” |
4.2 A list of PIEs [Postulated Initiating Events] shall be established to cover all events that could affect the safety of the plant. From this list, a set of design basis events shall be selected with deterministic or probabilistic methods or a combination of both, and used to set the boundary conditions according to which the structures, systems and components important to safety shall be designed, in order to demonstrate that the necessary safety functions are accomplished and the safety objectives met. |
4.3 - 2 - i: “In order to assess a plant to the SAP criteria a schedule should be provided that lists all postulated faults and hazards with unacceptable consequences. The schedule should include all initiating faults with their frequencies and consequences, the safety systems and beneficial safety-related systems involved for each initiating fault and the overall protection claim. This is the ‘safety schedule’ (also known as a fault and protection schedule) - see SAP para 346.” |
8.3 Only safety systems shall be credited to carry out a safety function. |
4.1 - 1 .... There are a number of points to consider which are either implied by or can be deduced from the SAPs' definition [of a Safety System]: i) It represents a purely functional definition, there is no implied standard of reliability or robustness. |
9.1 The fail-safe principle shall be considered in the design of systems and components important to safety. |
4.3 - 5 - iii - e: requires “evidence (if claimed) that the system is fail-safe with respect to failure of services and its own predominant failure modes (See Appendix 3 for further information).” |
9.2 A failure in a system intended for normal operation shall not affect a safety function. |
4.1 - 1 - vi: “SAP ESS19 & associated para 353 do not prohibit other functions being carried out in addition to the fault response function, although singleness-of-purpose is the very strong preference. When additional functions are present it is important to focus on the safety function, and the elements of the overall system that deliver it, and to assess the potential for any of the non-safety functions to interfere with it. The more integrated the non-safety functions, the more likely that their faults will impact on the safety function, and the lower the reliability of the system as a result. The same applies if two or more safety functions are integrated in a single system. Here the same concerns arise with respect to one safety function interfering with another.” 4.3 - 5 - iii - b: requires “evidence that the system is independent of and invulnerable to any fault (including any cause of any fault) that it is claimed to act against, and independent of and segregated/separated from all other systems.” |
9.4 The reliability of the systems shall be achieved by an appropriate choice of measures including the use of proven components, redundancy, diversity, physical and functional separation and isolation. |
4.2 - 6: [Expectations in terms of reliabilities for different SS configurations] 4.3 - 4: [Expectations relating to different configurations of SSs and their implications] 4.3 - 5: [Expected features of individual SSs] |
10.2 Instrumentation shall be adequate for measuring plant parameters and shall be environmentally qualified for the plant states concerned. |
4.3 - 5 - ii: requires - |
10.7 Redundancy and independence designed into the protection system shall be sufficient at least to ensure that: |
4.3 - 4 - iii: “Where credit is claimed for redundancy or diversity, appropriate levels of separation should be shown between each SS, between the services to each SS (unless the SS is shown to be fail-safe with respect to service failures), and adequate segregation between the SSs and other equipment. Additionally the system as a whole should either be shown to be invulnerable to single failures, or the components with single-failure potential should be shown to be reliable and robust enough for their failure contribution not to compromise system unreliability.” |
10.8 The design shall permit all aspects of functionality of the protection system, from the sensor to the input signal to the final actuator, to be tested in operation. Exceptions shall be justified. |
4.3 - 5 - ii - c: requires information on “the means provided to maintain, calibrate, test (under operational conditions where possible) and inspect each component (including sensors and actuators); the intervals proposed; and the method of reinstatement after maintenance /calibration /testing /inspection. [SSs should be designed and installed so as to facilitate maintenance and testing etc without excessive dose uptake to operators and without introducing new or increased risks.] Proof tests should be shown to be fully effective for all parts of the system involved in delivering the relevant safety function, including any automatic testing or diagnostic test equipment used as part of testing, either during service or during proof test. [Ref 12 4.79 et seq. and 4.97 et seq.] The use of bypasses or vetos during proof testing should be minimised and fully justified. If they are to be used, they should be implemented by properly engineered provisions. (See Appendix 2 for further information);" |
10.9 The design of the reactor protection system shall minimize the likelihood that operator action could defeat the effectiveness of the protection system in normal operation and anticipated operational occurrences. |
4.3 - 5 - ii - f: requires information to be available that either gives or references, for engineered systems, "details of bypasses, vetoes or intended overrides, if any, with evidence of necessity, demonstration of sound engineering, means and appropriateness of application and removal, and minimisation of human error potential. (See Appendix 2 for further information.)" |
10.10 Computer based systems used in a protection system, shall fulfil the following requirements: |
Appendix 1 - A discussion of problems in dealing with complexity in safety systems (See also T/AST/046 8 and Ref 12 4.35 and 5.43 et seq.) |
|
|
Issue K: Maintenance, in-service inspection and functional testing |
|
|
|
2.3 Data on maintenance, testing, surveillance, and inspection of SSCs shall be recorded, stored and analysed. Such records shall be reviewed to look for evidence of incipient and recurring failures, to initiate corrective maintenance and review the preventive maintenance programme accordingly. |
4.3 - 11 - ii: "A through life monitoring system should be set up to record all failures and causes of failures affecting safety systems. Such records should be reviewed periodically to allow improvement where possible and update the predictive estimates of hazard frequency and system unavailability in accordance with achieved performance. [Ref 12 6 63 et seq.]" |
3.1 SSCs important to safety shall be designed to be tested, maintained, repaired and inspected or monitored periodically in terms of integrity and functional capability over the lifetime of the plant, without undue risk to workers and significant reduction in system availability. Where such provisions cannot be attained, proven alternative or indirect methods shall be specified and adequate safety precautions taken to compensate for potential undiscovered failures. |
4.3 - 5 - ii - c: requires information on “the means provided to maintain, calibrate, test (under operational conditions where possible) and inspect each component (including sensors and actuators); the intervals proposed; and the method of reinstatement after maintenance /calibration /testing /inspection. [SSs should be designed and installed so as to facilitate maintenance and testing etc without excessive dose uptake to operators and without introducing new or increased risks.] Proof tests should be shown to be fully effective for all parts of the system involved in delivering the relevant safety function, including any automatic testing or diagnostic test equipment used as part of testing, either during service or during proof test. [Ref 12 4.79 et seq. and 4.97 et seq.] The use of bypasses or vetos during proof testing should be minimised and fully justified. If they are to be used, they should be implemented by properly engineered provisions. (See Appendix 2 for further information);" |