[C-safe-secure-studygroup] MISRA Compliance vs False Positives

Fri Jul 13 23:31:45 BST 2018

On 07/12/2018 10:22 AM, Clive Pygott wrote:
> I'll have a stab at answering Martin's question
>
> I'd say it was the case that the safety community prefers false
> positives over false negatives. Imagine you are designing a flight
> control system for a wide bodied jet, you've got 400 lives at risk. If
> there is a _possibility _that a program may do something seriously bad
> (say trigger undefined behaviour), it may be a false positive (in the
> sense that in a particular execution of the program, the bad behaviour
> may not happen), but going back to last night's discussion, what
> probability that 'it might not happen' would you be happy to live with?
> - and more importantly be happy to defend in court if it did happen?

Okay, thanks.  I do understand the premise.  I just don't agree
that the mindset behind it unavoidably leads to the desired result.
But that's a separate topic.

> If you look at ISO/IEC 61508 and the like, the permitted failure rates
> for life critical systems are minute.  For the most critical system
> (i.e. considering both hardware and software) an acceptable dangerous
> failure rate is between 10-9 and 10-8 failures/hour for a continuous
> control system   or   10-5 and 10-4 failures per demand for on-demand
> systems. To justify these sorts of numbers you have to take a
> precautionary view - you cannot afford a false negative and have to live
> with the false positives.

Sure.

>
> Regarding Martin's comments about false positives, they don't swamp the
> development process, because the developers are trained to write
> conservative code and eliminate anything that may be reported as a
> problem.

This is where I disagree.

Engineers adjust to overly pesky tools (and to excessive
requirements) by writing code that works around the false
positives the same way others adjust to what they perceive
as excessive restrictions.  They come up with ingenious ways
to work around those restrictions.  Sometimes it might be
safe but other times it could be mean introducing a bug.

I have seen this first hand.  When engineers were asked to
replace calls to "unsafe" string functions like strcpy with
their "safe" alternatives like strncy or strcpy_s from Annex
K and a static analysis checker was put in place to enforce
it,  we found in code review gems like this:

   strcpy_s (dst, src, strlen (src) + 1);

when what they were expected to do was:

   strcpy_s (dst, srs, sizeof dst);

But dst was a pointer and they didn't know how big the array
it pointed to was, so to keep the tool happy they put in
strlen(src) as the argument.  The analyzer never caught it
because it didn't occur to its authors that someone might do
this.

I firmly believe that the rules we put in place need to help
  people write better code by finding bugs, not restrict
the use of the language so severely as to make it difficult
to get work done.

> A corollary of this is that safety critical code is almost
> always written for the particular application. The first response when a
> MISRA analyser or similar reports a problem is to modify the code until
> the problem goes away (I'll admit that there is a risk that this can
> lead to ridiculously obfuscated code, that just confuses the analyser
> until it stops complaining, but hopefully this gets caught in further
> code review). On the whole the effect is that the KISS principle applies
> and you get conservative code.

I know from personal experience that hacking code until a tool
shuts up without understanding the nature of the problem --
whether it's in the tool or in the code -- is a recipe for
unmaintainable software.  It leads to brittle code that people
are afraid to touch.

Here's what works in my experience.  As the first response
see if it's a true or false positive.  If the former (1):
fix the code.  If the latter and it's because of a bug in
the tool (2) report the bug and suppress the diagnostic
using its suppression mechanism until the bug is fixed.
Document the suppression and track progress on the fix with
the vendor.  Remove the suppression when it's fixed.  If
the false positive is due to a bad rule (3) complain to
the people behind the design.  Consider joining the standard
body and work with them to change the rules to something more
sensible.  If the false positive is because the tool just
doesn't see the whole picture (4), suppress and document
it again.

What I care about of these four scenarios is (3) and (4).
I want us to avoid (3) and I want to give users a convenient
mechanism to do (4).  An annotation like a #pragma or
attribute.

> The only time safety critical code is likely to use previously developed
> code is when it was itself developed with safety critical use in mind
> (like some runtime executives). The important thing being that these
> come with the documentation, including test and analysis results, that
> justify their use - which is why they are so expensive.
>
> Really, the only time a deviation should be required is when you are
> _deliberately_ breaking a rule and have a good reason to do so - like
> needing to cast an integer to a pointer to get to a hardware peripheral
>
>
> Hope this helps

It does, thanks.  I'm not going to try to convince you to
change how MISRA should be used.  I also agree that there
is value in issuing diagnostics even when the tool cannot
prove there is a violation.  It's just not appropriate for
every project or domain.  I think we need to give users
the choice to decide how pesky they need their tool to be,
and how to easily deal with (or better yet avoid) the false
positives that are unavoidable.  At its simplest, the former
is just an on-off switch.  The latter, as I said, is some
sort of an annotation.

Martin