[trustable-software] Requirements and architecture for Safety

John P. Thomas jthomas4 at mit.edu
Mon Nov 5 21:51:31 GMT 2018


> SOTIF is not a gap in 61508 – in fact it is not something that will ever be part of 61508 and rightly not part of 26262 since these standards don’t deal with “is something safe” but with measures to manage the effects of malfunctions in electronics which might lead to a unwanted harm.

I actually agree with you here, but not everyone does. It depends on your perspective. If you see 61508, 26262, DO-178, etc. as a safety standard (that’s the argument I inferred from your first email) then SOTIF is a gap. If, instead, you see 61508 as a much more narrow standard to “manage the effects of malfunctions in electronics” rather than a safety standard, then SOTIF is outside the scope of the standard but still very important and necessary for safety.

Note that the 2011 version of 26262 took the former view, unlike you and I. This was discussed by the 26262 committee and eventually resulted in, among other things, NOTE 2 below the definition of failure: “NOTE 2: There is a difference between ‘to perform a function as required’ (stronger definition, use-oriented) and ‘to perform a function as specified’, so a failure can result from an incorrect specification.”. Shortly after 26262-2011 was released, other 26262 authors attended auto conferences in US and Europe and publicly discussed this issue declaring that the SOTIF-type problem is already covered in the scope of 26262 as written. They did not seem to share in your perspective above.

In any case, the current committee has decided that SOTIF is not adequately covered by the current 61508-based process--hence the birth of 21448.

I’m not aware of any study formally comparing statistics between different industries based on the safety standards used. My last sentences were a reaction to the 61508-based industries you listed. I thought I was hearing an argument that we should all use 61508 because those particular industries all derived their standards from 61508. I disagree. The Nuclear industry severely limits the use of software for safety-critical systems, so I didn’t think their success with 61508-based processes means that it is appropriate for others. An argument can also be made that their current approach has not been that successful for safety-critical digital I&C and software. Automotive has been experiencing and is now addressing critical gaps in their 61508-based process, so that’s not a strong argument to use 61508 either. A quick google search reveals this study<https://www.theexpertinstitute.com/medical-device-injuries-fda-data-reveals-increasing-risk/> of medical devices causing 5000 deaths annually, but I’m sure you can find many other studies. I was recently approached by a regulator in the medical device industry who was concerned with the number of deaths caused annually compared to practically every other industry—he cited about 100 people killed every week by flaws in medical devices and many more affected by adverse events. That’s a higher rate than many other industries with safety-critical software—imagine if we had a Boeing 737 crash every couple weeks. The one industry that everyone seems to point to as the safest and most successful—commercial aviation with 0 deaths in 2017 and millions of passenger flight hours—doesn’t use a 61508-based process at all. Not that I agree we should all copy aviation either—we shouldn’t—I’m just playing devil’s advocate.

I was disagreeing with what I thought was an argument to simply follow 61508 as a gold standard. 61508 is inadequate, but it sounds like you already agree.

John




From: trustable-software [mailto:trustable-software-bounces at lists.trustable.io] On Behalf Of Jonathan Moore
Sent: Monday, November 5, 2018 1:21 PM
To: Trustable software engineering discussion <trustable-software at lists.trustable.io>
Subject: Re: [trustable-software] Requirements and architecture for Safety

Agreed – the key element here is “performance standard”. I wasn’t trying to identify any type of gold standard – just that in any claim not to deal with the elephant in the room (e.g. state of the art) is at some level negligent and a counter argument / evidence or compliance is going to be needed eventually. Not least because the “expert” witness can come from anywhere and believe anything – and this is the point I think – we want credible expert witnesses to *have* to come from (maybe this is too strong – be aware of?) this community – and effectively rehearse / defend their point of view publicly.

SOTIF is not a gap in 61508 – in fact it is not something that will ever be part of 61508 and rightly not part of 26262 since these standards don’t deal with “is something safe” but with measures to manage the effects of malfunctions in electronics which might lead to a unwanted harm. No one says you must use electronics for safety but if you do it’s vital that you consider the myriad ways they can fail over the life of the product. In automotive it was somehow assumed that compliance with 26262 meant you had a safe automobile – this bubble has been well and truly burst now – but collectively there is no consensus (yet) on what determines the vehicle is sufficiently safe and many hope 21448 will be a silver bullet. Unfortunately 2E-8 (MEMS) is beyond the realm of the current production/operating and maintenance capability in automotive and in general automotive does not want military or aviation levels of oversight, preventative maintenance, trained operators and decommissioning. (I don’t think you are suggesting air traffic control for autonomous vehicles – but there are people I know who think that is inevitable). In the 2018 version of 26262 there has been a significant concession on the targets to derive random hardware failure rates from and it’s understood that combining ASIL-D systems doesn’t automatically mean a commensurate improvement in their individual reliabilities to keep the overall reliability at the same level. At exida we are expecting many games to be played here by suppliers and also manufacturers to be unhappy with the vagueness. Personally I’m sure that whatever Bosch/Siemens/Delphi/Denso/Renesas decide to do will determine the future vehicle architecture and thus the overall failure rate that’s reasonable – and everyone else will follow suit.

What these standards provide is a framework, many ideas and recommendations and principals, and methods that are known (by at least 1 person) to help and provide maybe provide insight into the ways that systems behave, fail and defy predictions and optimism. Many more than, arguable, we need, want, can act on – but nevertheless a rich vein to mine in the event someone wants to show that the design / product is not the product of reasonable engineering best practice. What we need to do is determine which of these are still valid / sensible / helpful - knowing what we do about the way software (and hardware) performs and fails, with the programmers of today and the tools they have access to and the mistakes they make – recognizing that standards didn’t really consider the many millions of lines of code that individual managers are responsible for on e.g. a car.

Can you point me to the data for your final sentence please – I’m interested in researching this more. I do wonder about the intra industry safety record that you are referring to and it makes me more accepting of the medical industry approach which has arguable had the biggest impact on high time in service failure modes than any other over the last say 60 years. (~45 in the 1950s to just over 70 now). Unfortunately I think you are right though that safety seems to improve in leaps and bounds and needs a catastrophe every now and then to focus people.

Jonathan

From: trustable-software <trustable-software-bounces at lists.trustable.io<mailto:trustable-software-bounces at lists.trustable.io>> On Behalf Of John P. Thomas
Sent: Monday, November 5, 2018 10:41 AM
To: trustable-software at lists.trustable.io<mailto:trustable-software at lists.trustable.io>
Subject: Re: [trustable-software] Requirements and architecture for Safety

I don't think 61508 should be considered the gold standard. Not all safety standards are derived from 61508, and not all cite 61508.  Military/defense use MIL-STD-883 and commercial aircraft use ARP4761 (many millions of flight hours in 2017 and 0 commercial accidents) .
It's also worth noting that although 26262 (automotive) is derived from 61508, the auto standards committee in charge of 26262 recently recognized a significant gap omitted from the 61508-style approach. They reached international consensus that a different safety standard (21448) with a different approach is needed to fill the gap.
It is not clear that 61508-derived methods are any better off than the alternatives. Many of the industries stuck with 61508 either have a worse safety record or they severely limit the amount of safety-critical software compared to others.
John
On Nov 5, 2018, at 11:58 AM, Jonathan Moore <jmoore at exida.com<mailto:jmoore at exida.com>> wrote:

In general there are two objectives for safe systems:
 Reliability: Perform the intended function correctly
 Safety Engineering: Fail in a predictable manner

In general we assign reliability (statistics) to hardware - ie making sure your memory doesn't change from underneath you - or having sufficient features to test before, during and after missions whether hardware features still work.
Safety engineering for software at least is mostly received wisdom starting with "Programming Proverbs" from 1975 - anyone remember that and work by Edward Yourdon in the 1980s! In short it's believed there are good characteristics of a software development project (or lifecycle) that are helpful to improving safety (or more likely - reducing the OH! NO! moments - and head scratching after something went wrong) - that software doesn't just magically have all the features it needs and get developed without mistakes by chance - it actually needs a fair bit of looking at, thinking about and we aren't so good at doing that when looking at lines of ascii text and we benefit from different representations and measurements of the system to help understand what is going on.

It's a subject of furious debate whether machine learning (or the myriad of related terms) is actually a statistical problem (non-deterministic) or a safety engineering problem (complex Von Neuman that we don't understand well enough) ... yet. Either way this domain of research will eventually boil down into a cook book (of safety engineering best practice) or a statistical model (and reliability best practice / target) or some combination of both. Unless we find another pivotal pioneer (think Turing, Shannon, Williams, Dijkstra, Moore, Hoare, Ritchie, Knuth, ...)

These underlying principals are all liberally explained and preserved in the grandfather of all functional safety standards -  a so called "Type B" for complex electronics (MCU, MPU and above complexity) in IEC 61508 from which *all* sector specific standards are derived and the definitive state of the art in addition to any sector specific standards that have been derived from that which may choose to focus on a particular area of 61508 - ie 62304 is aimed at medical software lifecycle, 26262 at automotive scale and supply norms, 13849 simple machinery, 62061 simple programmable circuits, 13482 robots near people,  61513 nuclear, 61511 process industry, etc. etc.

Many people forget 61508 is a "performance based" standard for *all* industries, ie not an implementation reference for a particular industry or a simplified approach for specific devices and operating environments or a cookbook for low risk applications.

Sector specific standards usually result from a get together of experts in that sector to throw away the bits that aren't relevant e.g. to Medical, Nuclear, Automotive, Rail, ... and replace general language with more sector specific language (for example both medical and automotive use FMEA) but there are differences in the names used for systematic weakness, fault, failure, mitigation, action, etc. that make it more accessible by practitioners in those fields.

Certification exists (either voluntary or in some sectors mandatory) a third party to "test" the underlying argument for safety (is sound and supported by the use of reliable methods and principals) and that those methods and principals are applied correctly, by experts (which can also be certified). No good hoping to achieve safety, using untested methods, implemented by incompetent (or well meaning) novices.

A company will chose to do this in advance of product release as a rehearsal for the possible formal test of the argument for safety in a court. A court is a place where humanity chooses to settle disputes and the attorneys there are experts at analyzing the strength of arguments for and against - in the case of product liability that there was insufficient effort or evidence a manufacturer took reasonable steps to ensure their product or design was safe.

I'm sure Craig Williams wishes he had been given this opportunity. https://www.bbc.com/news/uk-england-45991236 and there are many other cases of engineers/managers/directors and CEOs going to jail for negligence, incompetence, ignorance and laziness in their engineering. It's not just fines and out of court settlements.

Don't hope to find any innovation in ISO or IEC though - these standards while representing state of the art - often lag by years the actual activities being undertaken in the marketplace. What needs to happen is a group of people in different companies need to realize they are all working on a similar problem, get over the fear of loss of IP, secrecy and agree to attempt to work together to protect their interest by establishing (often minima) best practices, procedures, activities, evidence, process, results, targets they collectively feel is in the interest of their sector. Of course if people have already done this - then a new entrant will need to join that existing group and attempt to enhance/change/improve/relax the state of the art in light of their research, measurement and results.

It's fair I think to summarize:
 Collaborative efforts like open source software are not really envisaged by existing standards
 Best practice in open source is not exactly written down - although it does exist
 In principal there are many advantages to this type of development (not credited in standards)
 And some pitfalls too

If safety is a system property and we don't know what that end system is then it's going to be hard to say our software is safe. On the other hand we can certainly enable end integrators considering trustable software as a component of their system for which they need to achieve some level of safety - and make it easy/easier to use/adopt/include in their body of evidence.

Fortunately since all safety standards refer to 61508 we can use that as our starting point to refine these claims in light of what we know of the features/pitfalls/weaknesses and bugs that are typically present in our particular component for which we desire trust.

Both are following the same underlying ISO 12100 approach to risk (quantify it) and 61508 (if you are using electronics spend your effort on the most important things first) - ie don't waste your time finessing what might be needed in a nuclear plant at the expense of the obvious low hanging fruit.

So we might add to your two lists:

- evidence that we have considered unexpected behavior
and
- ensured our tests include appropriate noise conditions (or disturbances) designed to trigger unexpected behavior

Arguably these can just be a subset of "desired" but in practice most software developers choose to focus only on the desired feature and forget the potential side effects or don't have the experience to anticipate the unwanted triggers

I find these two diagrams attached help be understand the magnitude of the problem - but also the opportunities:

The general thinking on security is that it follow the same approach as has been used in safety. I conclude from this that in general the same people that are being asked to be concerned about safety in an organization are picking up a new assignment and being asked to think about security and they would like to apply the same underlying approaches/thinking/evidence. There are two main initiatives here:
 ISA, completed for industrial
 SAE, work in progress for automotive

Jonathan

-----Original Message-----
From: trustable-software <trustable-software-bounces at lists.trustable.io<mailto:trustable-software-bounces at lists.trustable.io>> On Behalf Of Paul Sherwood
Sent: Monday, November 5, 2018 7:39 AM
To: trustable-software at lists.trustable.io<mailto:trustable-software at lists.trustable.io>
Subject: [trustable-software] Requirements and architecture for Safety

Over recent months I've been attempting to understand the implications of safety and security, which have always seemed to me to be the most difficult of the seven factors we've identified as material for trustable software.

In this email I'll summarise my current understanding for Safety

- safety is a system property - we should only consider safety in the context of a whole system
- the most widely used approaches to safety focus on reliability, and this leads to specific demands for reliable software
- safety standards make demands about the processes to be applied for critical software

If we are aiming for trustable software in a safety-critical system, we need

- evidence that risks and hazards inherent in the system and its environment which could cause harm have been identified
- evidence to show how the system has been designed/architected to deal with the risks and hazards
- evidence to show that the software behaves as expected to support the safety design/architecture

I've made myself quite unpopular on the System Safety Mailing List [1] in pointing out that lots of successful software has been and will continue to be created without traditional 'requirements' or 'architecture'. So far I don't see that the lack of such documents for a piece of software is a red flag for consuming it in a safety critical system.

However I think I'm now clear that we can only make claims about safety of a system if we can provide evidence to show how we believe we have made the system safe.

So it seems to me that we can't close this gap for trustable software, without evidence that we have

a) captured our system-level safety requirements, based on prevention of losses due to identified risks and hazards
b) described our safety architecture to address the safety requirements
c) specified the required properties/behaviours of components within the architecture (e.g. the software that we hope to be trustable)
d) tested that the properties/behaviours are delivered

All of this evidence should be expected to evolve, so as usual I'd expect the documentation to be kept under version control, with history and provenance, and actively maintained in CI/CD.

I've not thought so much about the Security topic yet, but my current instinct is that similar reasoning must apply, ie we need system security requirements, a system security architecture, and evidence that chosen components deliver desired behaviours to support the architecture.

br
Paul

[1] http://systemsafetylist.org


________________________________

trustable-software mailing list
trustable-software at lists.trustable.io<mailto:trustable-software at lists.trustable.io>
https://lists.trustable.io/cgi-bin/mailman/listinfo/trustable-software

________________________________

trustable-software mailing list
trustable-software at lists.trustable.io<mailto:trustable-software at lists.trustable.io>
https://lists.trustable.io/cgi-bin/mailman/listinfo/trustable-software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.trustable.io/pipermail/trustable-software/attachments/20181105/a61f9c86/attachment-0001.html>


More information about the trustable-software mailing list