[trustable-software] Don't pretend that Reliability means Safety (was Re: Additional requirement for trustability)

Wed Sep 12 16:41:31 BST 2018

We are in violent agreement. Reading this again I spot the ambiguities in my statements. I should perhaps have prefixed this first post with "this is an attempt to explain how the herd of people who don't think, think"

Further thoughts inline.

> -----Original Message-----
> From: Paul Sherwood <paul.sherwood at codethink.co.uk>
> Sent: Wednesday, September 12, 2018 1:07 AM
> To: Trustable software engineering discussion <trustable-
> software at lists.trustable.io>
> Cc: Jonathan Moore <jmoore at exida.com>
> Subject: Don't pretend that Reliability means Safety (was Re: [trustable-
> software] Additional requirement for trustability)
> 
> Hi Jonathan,
> thanks for this. Please see my comments inline.
> 
> On 2018-09-11 15:23, Jonathan Moore wrote:
> > In automotive at least the department is generally called
> > Homologation. These people are charged with a couple of key
> > responsibilities:
> >
> > 1. Assembling the pieces of paper for a product action into the
> > archive that will get released in discovery and may help the argument
> > that the OEM adopted State of the Art, wasn't
> > incompetent/lazy/ignorant etc.
> 
> Yup, understood. "Let's huddle together and make sure we're no worse than the
> herd."
> 
> > 2. When a new product action is announced identifying the pieces of
> > paper that engineering need to create so they can estimate the effort,
> > plan and deliver them prior to launch 18-24 months later.
> 
> This has appeared to 'work' for a very long time, so it's kind of understandable.
> However, based on what I already know about autonomous, I think all bets are
> off for grandfathering in proven-in-use 'safe' bits
> :)
Agreed - the point was they don't just rinse and repeat collection of the same piece of paper they have to stay current with legislation, law, warranty, settlements, ... and make sure that new car line planning includes instructing the engineers on what is the best and hopefully minimal set of evidence to collect to meet legal and corporate liability (which is always a margin to the legal minimum) obligations and this is heavily dependent on the launch date (e.g. before/after a piece of legislation is enacted), the vehicle content and the territories of sale.
> 
> > For a world car this can be quite complex and in part I think what's
> > driving the effort, in the US at least, to attempt to prevent
> > individual states from creating their own unique legislation
> > regulating automated driving vehicles. I'm not 100% sure the value of
> > this because algorithms of the future are likely going to have to deal
> > with handedness, road sign differences, local conventions anyway but
> > that's as I understand the approach at the moment.
> 
> Personally I don't have high confidence that individual states/countries will
> create uniformly appropriate and effective legislation. Do you?
> 
I don't - and I don’t believe any established automotive company does either - there are dreamers though.
> <snip>
> 
> > Keep in mind most (all) automotive OEMS and their reputable Tier One's
> > are members of the SAE. This 'club' is intended as the repository of
> > all things best practice, state of the art and forms the back bone of
> > most liability defenses and is an attempt at a sort of herd immunity.
> 
> Amazingly, I hadn't even heard of this organisation :/
Two to begin with SAE J3016 and J3018
https://www.sae.org/standards/content/j3016_201401/
https://www.sae.org/standards/content/j3018_201503/
> 
> > The argument is something like, the last model of the vehicle was
> > demonstrably safe (didn't have this particular fault), here are the
> > changes we made (they were small and incremental improvements), here's
> > the statistics, data, studies, conclusions of the experts in how to
> > engineer x safely. Here's our evidence we followed those
> > recommendations and evidence that what we changed made improvements.
> > SAE have a very close relationship with ISO and IEC and OEMs like the
> > voluntary adoption of their standards because of the collective
> > immunity by adopting best practice.
> 
> Even if it were true that vehicle x+1 only has a manageable small set of changes
> vs vehicle x (which it often isn't these days afaict), a small change in any
> component can clearly have unexpected implications for other elements of the
> system, so I think from an engineering perspective the basis of the argument is
> false.
False might be an end stop - think of it more as a spectrum and the more false, as you say, being on the harder, more complex, more costly end. Somewhere between a change that makes no difference to sales and changes that are too significant to release there is a sweet spot where the organization reports that it can bring to bear sufficient resources, engineering, testing, and thinking about unexpected implications to design and implement changes that mostly make an improvement to a vehicle's performance or worst case don't make the current behavior any worse. I agree it's a cascading effect of increasing complexity but System V approach is designed to enable this. The general idea is that we characterize and compare the new components with their prior versions. We integrate those based on the changes we expect them to cause at the higher level and we carefully consider whether the changes might cause additional unwanted interactions. Finally we subject this to the normal find and fix corrective action procedure where we monitor performance very carefully in the initial fleet users and then 3,6,9,12,24,36 warranty periods and release improved parts as they exhibit statistically significant failure modes across the population - hopefully before incurring the wrath of a plaintiff.
In the case of automated driving this isn't so easy as you indicate because when introduction of a change (ie driver based vehicle to driverless) requires significant architecture change, new components that have never been on vehicles before and significant new interfaces the changes approach your use case. I don't think we should give up - it is happening and people are experimenting with a multitude of different approaches, architectures, components. What we need to find is a design that can coexist with the existing architectures (and thereby minimize the disruption to those components and systems) and find designs that don't introduce too many unknowns, unvalidated, unsafe, failure modes. It remains to be seen what this architecture is. Bosch feel they have a solution. Mercedes do too. Others are experimenting still.
> 
> > In Europe it's not so clear to me that IEC 61508 and ISO 26262 are in
> > fact not legal requirements. The three key Type Approval regulations
> > (that apply since the Machinery Directive excludes on road vehicles) -
> > have language like "include but not limited to" when it comes to
> > identifying risk and safety based actions, or they specifically
> > mandate E/E systems by name necessary for safety which current best
> > practice/state of the art brings them into 508 and 262. Incidentally
> > there is a proposal for regulation in progress since May and I've not
> > met an European OEM homologation engineer that believes these
> > 'guidelines' are anything other than required i.e. unwilling to let
> > the company take a risk of finding out in a court. The directives in
> > force in Europe are 661/209, 78/2009 and 79/2009 and freely available
> > from EUR-Lex.  If the call is successful it looks like the plan is to
> > repeal these in favour of a new regulatory document.
> 
> Interesting. Any early signals about what approach that will take?
> 
I think they are going after texting and playing with your phone ☹
> A couple of people have mentioned SOTIF to me recently, and IIUC that's
> basically the standards folks gently switching horses without admitting they
> were wrong before...
SOTIF concepts have been around a long while. It might be as you suggest but I think the real driver is liability. 26262 came about to limit liability - there was a glaring hole in automotive - no one wanted to adopt IEC 61508 (the grandfather) and the complexity was out of control and the evidence was increasing that vehicles were not safe. 26262 plugs much/some/a bit of that gap but doesn't envisage this new and future architecture and liability front so engineers at OEMs charged with managing risk started asking how do we know the electronics we get from our suppliers when they are connected together are actually safe enough - vs the question in 26262 which was how do we show that our products are sufficiently free from defects, bugs, random failures escalating to unsafe driving conditions, single points of failure, common cause, latent faults, ... 26262 is silent on this and arguably not appropriate for even the architectures we have in cars today.
> 
> > Ultimately I think the legal compliance or not discussion is a dead
> > end. Toyota taught us the courts don’t even care to see a specific bug
> > if the overwhelming evidence is that the product wasn't the result of
> > reasonable engineering best practice.
> 
> Good point! :-)
> 
> And let me state here, unequivocally, so there's a bit of 'prior art'
> from this community irrespective of what the standards folks and other
> clubs/herds want to claim in court later:
> 
> ***
> 
> As researched and thoroughly documented by MIT and others, safety is a
> system-level property, not correlated with reliability of components.
> 
> I've recreated the assumptions table from Nancy Leveson's book at [1] and I
> quote:
> 
> "High reliability is neither necessary nor sufficient for safety."
> 
> "Highly reliable software is not necessarily safe. Increasing software reliability
> will have little impact on software safety."
> 
> Anyone in 2018 failing to consider safety at the system level, including all of the
> potential interactions between components, and the socio-technical
> interactions from outside that affect how we construct systems, is obviously
> **not** applying reasonable engineering best practice, since the research
> including implementation methods is freely available and the methods are
> commercially justifiable.
> 
Indeed and safety has never been about creating products that are free from risk or too expensive. It's always been about finding appropriate/reasonable solutions - although sometimes I find colleagues that forget that.
> ***
> 
> > The fundamental question is one that has been debated and argued for
> > many decades now. Do you believe it is possible to predict the failure
> > of electronics using statistical techniques or not?
> 
> Wrong question. Reliability != Safety.
> 
This was more of an FYI - conversely it's not possible to claim safety without some understanding of the reliability. Either the parts are reliable enough or they aren't and if they aren't we can either improve them or reduce the sensitivity of the design to their failure modes. I'm indifferent but have found in many cases that reducing the sensitivity is often more logical, cheaper and easier.
> > If you don't and
> > there are standards e.g. 13849 that don't, or do and there are
> > standards that are firmly rooted in this 61508 the fact remains we
> > have to describe / defend / justify to our peers, families, friends
> > and children what we are doing and why it is safe.
> 
> True. Let's focus on that.
> 
> > If we get it wrong
> > there will be a chorus of 'told you so' and experts lining up to
> > defend the plaintiff and I don't know any automotive Homologation
> > Engineers whose organization will allow them to take a course of
> > action that puts them out of step with their peers and colleagues -
> > either within or external to their company. ie if we take a new
> > position on a model year 2020 vehicle and it's safer why did the
> > company not adopt this new approach on all their other carlines.
> 
> Irrespective of how well or badly people may think that the previous approaches
> have worked, all bets are off for autonomous, and anyone who suggests
> otherwise must be either lying, lazy, stupid or incompetent imo.
> 
We still have to work with them - but I feel like this sometimes too 😊
> > I have some notes on this equation if folk are interested in the
> > derivation. What we are wrestling with is the RHS equation and I think
> > the conclusion that this is basically unknowable / too complex to
> > calculate from first principals (e.g. as 61508/26262) requires
> 
> Sorry, I must be missing something. Is this an equation for reliability, in which
> case it's irrelevant as stated above? Or something else...
> 
> > don’t have 1E9 hours of proven in use and in automotive at least the
> 
> "Proven in use" is not engineering IMO, it's just folklore.
That's the point it is a valid approach but not appropriate to automotive. In other industries the bar is very high and the highly variable, uncontrollable noise space of automotive makes this impossible - as say compared with a valve in a chemical plant that doesn't move, doesn't get too hot or too cold, doesn't get sand and coke coverings, can survive for 1000s of hours longer than the useful life, isn't mass produced (so has lower piece to piece variability), and has limited interaction with things around it because either they don't move or are far apart. I don't remember looking at a car and thinking - this is a very controlled environment where it will be easy to anticipate all the stresses that my component/system will be subjected too. Clearly you can take a part that is proven in use and put it somewhere where it will fail immediately so just having 1E9 hours of experience isn't enough - you need to have some semblance of systematic capability to guide you new application.
> 
> > use case is so general purpose and since we don’t recover units from
> > the field at end of life (crushing vs. e.g. lifting a plane from the
> > ocean floor) and …
> >
> > An alternative approach might be to select from several design
> > alternatives - the one that is least likely to fail and then combine
> > that with STAMP/Reliability prediction on the lower complexity
> > subsystems/subcompents/elements.
> 
> Possibly. Let's attempt to properly describe the problem/intents before we jump
> to potential solutions, though :)
Indeed - I hope this comes across with the right tone - I'm not trying to attack anyone or be objectionable but I think I can help illuminate some of the problems and road blocks that people will create and I hope more importantly the motivation behind creating those because I think if we can deal with that we can expect the needed change. I think I can count on one hand the number of posts I have made to a mailing list and I think that makes me a ... noob?
> 
> br
> Paul
> 
> [1] https://gitlab.com/trustable/overview/wikis/safety/stpa-notes