[trustable-software] Requirements and architecture for Safety

Jonathan Moore jmoore at exida.com
Mon Nov 5 16:58:42 GMT 2018


In general there are two objectives for safe systems:
	Reliability: Perform the intended function correctly
	Safety Engineering: Fail in a predictable manner

In general we assign reliability (statistics) to hardware - ie making sure your memory doesn't change from underneath you - or having sufficient features to test before, during and after missions whether hardware features still work.
Safety engineering for software at least is mostly received wisdom starting with "Programming Proverbs" from 1975 - anyone remember that and work by Edward Yourdon in the 1980s! In short it's believed there are good characteristics of a software development project (or lifecycle) that are helpful to improving safety (or more likely - reducing the OH! NO! moments - and head scratching after something went wrong) - that software doesn't just magically have all the features it needs and get developed without mistakes by chance - it actually needs a fair bit of looking at, thinking about and we aren't so good at doing that when looking at lines of ascii text and we benefit from different representations and measurements of the system to help understand what is going on.

It's a subject of furious debate whether machine learning (or the myriad of related terms) is actually a statistical problem (non-deterministic) or a safety engineering problem (complex Von Neuman that we don't understand well enough) ... yet. Either way this domain of research will eventually boil down into a cook book (of safety engineering best practice) or a statistical model (and reliability best practice / target) or some combination of both. Unless we find another pivotal pioneer (think Turing, Shannon, Williams, Dijkstra, Moore, Hoare, Ritchie, Knuth, ...)

These underlying principals are all liberally explained and preserved in the grandfather of all functional safety standards -  a so called "Type B" for complex electronics (MCU, MPU and above complexity) in IEC 61508 from which *all* sector specific standards are derived and the definitive state of the art in addition to any sector specific standards that have been derived from that which may choose to focus on a particular area of 61508 - ie 62304 is aimed at medical software lifecycle, 26262 at automotive scale and supply norms, 13849 simple machinery, 62061 simple programmable circuits, 13482 robots near people,  61513 nuclear, 61511 process industry, etc. etc.

Many people forget 61508 is a "performance based" standard for *all* industries, ie not an implementation reference for a particular industry or a simplified approach for specific devices and operating environments or a cookbook for low risk applications.

Sector specific standards usually result from a get together of experts in that sector to throw away the bits that aren't relevant e.g. to Medical, Nuclear, Automotive, Rail, ... and replace general language with more sector specific language (for example both medical and automotive use FMEA) but there are differences in the names used for systematic weakness, fault, failure, mitigation, action, etc. that make it more accessible by practitioners in those fields.

Certification exists (either voluntary or in some sectors mandatory) a third party to "test" the underlying argument for safety (is sound and supported by the use of reliable methods and principals) and that those methods and principals are applied correctly, by experts (which can also be certified). No good hoping to achieve safety, using untested methods, implemented by incompetent (or well meaning) novices.

A company will chose to do this in advance of product release as a rehearsal for the possible formal test of the argument for safety in a court. A court is a place where humanity chooses to settle disputes and the attorneys there are experts at analyzing the strength of arguments for and against - in the case of product liability that there was insufficient effort or evidence a manufacturer took reasonable steps to ensure their product or design was safe.

I'm sure Craig Williams wishes he had been given this opportunity. https://www.bbc.com/news/uk-england-45991236 and there are many other cases of engineers/managers/directors and CEOs going to jail for negligence, incompetence, ignorance and laziness in their engineering. It's not just fines and out of court settlements.

Don't hope to find any innovation in ISO or IEC though - these standards while representing state of the art - often lag by years the actual activities being undertaken in the marketplace. What needs to happen is a group of people in different companies need to realize they are all working on a similar problem, get over the fear of loss of IP, secrecy and agree to attempt to work together to protect their interest by establishing (often minima) best practices, procedures, activities, evidence, process, results, targets they collectively feel is in the interest of their sector. Of course if people have already done this - then a new entrant will need to join that existing group and attempt to enhance/change/improve/relax the state of the art in light of their research, measurement and results.

It's fair I think to summarize:
	Collaborative efforts like open source software are not really envisaged by existing standards
	Best practice in open source is not exactly written down - although it does exist
	In principal there are many advantages to this type of development (not credited in standards)
	And some pitfalls too

If safety is a system property and we don't know what that end system is then it's going to be hard to say our software is safe. On the other hand we can certainly enable end integrators considering trustable software as a component of their system for which they need to achieve some level of safety - and make it easy/easier to use/adopt/include in their body of evidence.

Fortunately since all safety standards refer to 61508 we can use that as our starting point to refine these claims in light of what we know of the features/pitfalls/weaknesses and bugs that are typically present in our particular component for which we desire trust.

Both are following the same underlying ISO 12100 approach to risk (quantify it) and 61508 (if you are using electronics spend your effort on the most important things first) - ie don't waste your time finessing what might be needed in a nuclear plant at the expense of the obvious low hanging fruit.

So we might add to your two lists:

- evidence that we have considered unexpected behavior
and
- ensured our tests include appropriate noise conditions (or disturbances) designed to trigger unexpected behavior

Arguably these can just be a subset of "desired" but in practice most software developers choose to focus only on the desired feature and forget the potential side effects or don't have the experience to anticipate the unwanted triggers

I find these two diagrams attached help be understand the magnitude of the problem - but also the opportunities:

The general thinking on security is that it follow the same approach as has been used in safety. I conclude from this that in general the same people that are being asked to be concerned about safety in an organization are picking up a new assignment and being asked to think about security and they would like to apply the same underlying approaches/thinking/evidence. There are two main initiatives here:
	ISA, completed for industrial
	SAE, work in progress for automotive

Jonathan

-----Original Message-----
From: trustable-software <trustable-software-bounces at lists.trustable.io> On Behalf Of Paul Sherwood
Sent: Monday, November 5, 2018 7:39 AM
To: trustable-software at lists.trustable.io
Subject: [trustable-software] Requirements and architecture for Safety

Over recent months I've been attempting to understand the implications of safety and security, which have always seemed to me to be the most difficult of the seven factors we've identified as material for trustable software.

In this email I'll summarise my current understanding for Safety

- safety is a system property - we should only consider safety in the context of a whole system
- the most widely used approaches to safety focus on reliability, and this leads to specific demands for reliable software
- safety standards make demands about the processes to be applied for critical software

If we are aiming for trustable software in a safety-critical system, we need

- evidence that risks and hazards inherent in the system and its environment which could cause harm have been identified
- evidence to show how the system has been designed/architected to deal with the risks and hazards
- evidence to show that the software behaves as expected to support the safety design/architecture

I've made myself quite unpopular on the System Safety Mailing List [1] in pointing out that lots of successful software has been and will continue to be created without traditional 'requirements' or 'architecture'. So far I don't see that the lack of such documents for a piece of software is a red flag for consuming it in a safety critical system.

However I think I'm now clear that we can only make claims about safety of a system if we can provide evidence to show how we believe we have made the system safe.

So it seems to me that we can't close this gap for trustable software, without evidence that we have

a) captured our system-level safety requirements, based on prevention of losses due to identified risks and hazards
b) described our safety architecture to address the safety requirements
c) specified the required properties/behaviours of components within the architecture (e.g. the software that we hope to be trustable)
d) tested that the properties/behaviours are delivered

All of this evidence should be expected to evolve, so as usual I'd expect the documentation to be kept under version control, with history and provenance, and actively maintained in CI/CD.

I've not thought so much about the Security topic yet, but my current instinct is that similar reasoning must apply, ie we need system security requirements, a system security architecture, and evidence that chosen components deliver desired behaviours to support the architecture.

br
Paul

[1] http://systemsafetylist.org



_______________________________________________
trustable-software mailing list
trustable-software at lists.trustable.io
https://lists.trustable.io/cgi-bin/mailman/listinfo/trustable-software
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Safety Protection.png
Type: image/png
Size: 124581 bytes
Desc: Safety Protection.png
URL: <https://lists.trustable.io/pipermail/trustable-software/attachments/20181105/1bdacdb1/attachment-0001.png>


More information about the trustable-software mailing list