[C-safe-secure-studygroup] On MISRA C:2012 Rule 14.3 – "Controlling expressions shall not be invariant"

Thu May 31 11:02:40 BST 2018

On 30/05/18 22:54, Martin Sebor wrote:
> On 05/30/2018 05:58 AM, Fulvio Baccaglini wrote:
>>
>> On 30/05/18 03:50, Martin Sebor wrote:
>>> On 05/29/2018 06:55 AM, Fulvio Baccaglini wrote:
>>>> Hi,
>>>>
>>>> This is my current personal understanding & initial thoughts, to be
>>>> subject to review.
>>>>
>>>> MISRA C:2012 Rule 14.3 – "Controlling expressions shall not be
>>>> invariant"
>>>>
>>>> Required - Undecidable - System
>>>>
>>>> Exceptions:
>>>> (1) infinite loops (expression always evaluates to true)
>>>> (2) single iteration do while loop (expression always evaluates to
>>>> false)
>>>>
>>>> MISRA provides two reasons for this rule:
>>>> (A) symptom of a programming error
>>>> (B) compiler removing defensive code
>>>>
>>>> An example of case (B) could be software trying to ensure that some
>>>> critical object has not been corrupted, before using it.
>>>>
>>>> Corruption could occur through hardware failure but also through
>>>> undefined behaviour arising in some other logically unrelated part of
>>>> the program. Due to undecidability, it is not guaranteed that a
>>>> tool may
>>>> detect and report that undefined behaviour.
>>>
>>> In general it's not possible for a higher layer abstraction to
>>> anticipate and recover from failures of a lower layer, or even
>>> the same layer.  This includes compilers trying to emit code to
>>> detect, let alone recover from, hardware failures, or programmers
>>> trying to write code that recovers from compiler bugs or other
>>> kinds of undefined behavior.  Likewise, short of employing full
>>> redundancy, it's simply not possible to write code that recovers
>>> from its own bugs.
>>>
>>> Removing code that's unreachable due to program invariants is
>>> one of the basic transformations every optimizing compiler does
>>> and all programmers rely on to build efficient software.  A rule
>>> designed to defeat that optimization would be entirely counter-
>>> productive.  The only way to come close would be to make all
>>> variables atomic and all accesses volatile.
>>>
>>>>
>>>> In C99 controlling expressions occur in: if, switch, while, do while,
>>>> for, ?:
>>>>
>>>> The possibilities are:
>>>>
>>>> * if (always_true ())  ==> the check is dead code
>>>> * if (always_false ()) ==> the check is dead code and the body is
>>>> unreachable code
>>>> * switch (always_the_same ()) ==> the check is dead code, some case
>>>> clauses are unreachable code
>>>> * while (always_true ()) ==> infinite loop or loop with breaks in the
>>>> body
>>>> * while (always_false ()) ==> the check is dead code and the body is
>>>> unreachable code
>>>> * do while (always_true ()) ==> infinite loop or loop with breaks
>>>> in the
>>>> body
>>>> * do while (always_false ()) ==> the check is dead code, but enforces
>>>> semicolon after macro invocation
>>>> * for (...; always_true (), ...) ==> infinite loop or loop with breaks
>>>> in the body
>>>> * for (...; always_false (), ...) ==> the check is dead code and the
>>>> body is unreachable code
>>>> * always_true () ? x : y ==> the check is dead code and the 3rd
>>>> operand
>>>> is unreachable code
>>>> * always_false () ? x : y ==> the check is dead code and the 2nd
>>>> operand
>>>> is unreachable code
>>>>
>>>> In summary, invariant expressions are associated with:
>>>>
>>>> * Dead code
>>>> * Unreachable code
>>>> * Infinite loops
>>>> * Loops with breaks in the body
>>>> * Enforcing semicolon after macro invocation
>>>>
>>>> Dead code and unreachable code relate to (A) symptom of a programming
>>>> error - why has the developer written software that is unnecessary?
>>>> why
>>>> is the software trying to deal with a situation that cannot arise?
>>>
>>> There are situations where an invariant controlling expression
>>> is a mistake.  There are also common implementation and design
>>> techniques where using even integer constant expressions in
>>> these contexts is intended.  In all contexts, short of disabling
>>> optimization, it's outside programmers' control where invariants
>>> are determined and their value propagated across statement,
>>> function, or even translation unit boundaries by the compiler
>>> and used to eliminate unreachable code.
>>>
>>> A simple example is code that verifies, either statically or
>>> dynamically, global preconditions about the host environment:
~~~~ Example 1 ~~~~
>>>
>>>   if (sizeof (int) != 4
>>>       || sizeof (long) != 4
>>>       || sizeof (void*) != 4)
>>>   {
>>>     fprintf (stderr, "only ILP32 host environment supported");
>>>     exit (1);
>>>   }
>>>
>>> Another example is software that can be configured at compile
>>> time to target different architectures often has tests of
>>> the form:
~~~~ Example 2 ~~~~
>>>
>>>   if (ARCH_SUPPORTS_FEATURE_FOO_FOR (x)) {
>>>     // use feature foo with x
>>>   }
>>>
>>> For some architectures, ARCH_SUPPORTS_FEATURE_FOO_FOR() macro
>>> might expand to 0/false.  On others, it might expand to 1/true.
>>> For others still, to some non-trivial query whose result depends
>>> on the value of its argument.
>>>
>>> A more modern example of a similar approach is an object-oriented
>>> design of an interface whose base implementation is roughly
>>> equivalent to the above:
>>>
~~~~ Example 3 ~~~~
>>>   if (arch->supports_feature_foo_for (x)) {
>>>     // ...
>>>   }
>>>
>>> where the "base implementation" of supports_feature_foo_for()
>>> for some bare-bones minimal architecture returns false and that
>>> implementations for more advanced architectures have the option
>>> to "override" to return true.
>>>
>>> An entirely different example that doesn't rely on constant
>>> expressions is the following:
>>>
~~~~ Example 4 ~~~~
>>>   FILE* open_temp_file (const char *dir, int n)
>>>   {
>>>     char pathname[256];
>>>     int n = snprintf (pathname, sizeof pathname,
>>>                       "%s/%u.tmp", dir, n);
>>>     if (n > sizeof buf)
>>>       return 0;
>>>     return fopen (pathname, "w");
>>>   }
>>>
>>> Since the rule is system-wide, if the longest string in a program
>>> the function is called with isn't long enough for snprintf to
>>> truncate the output the rule effectively forces the programmer
>>> to avoid testing the snprintf result (and possibly even replace
>>> snprintf with sprintf).
>>>
>>> Similar to other rules we have discussed, 14.3 is overly
>>> simplistic, mistakes correlation for causation, and results
>>> in invalidating common and safe coding practices.  Worse, it
>>> can also force programmers to replace good code with unsafe
>>> alternatives.
>>
>> Looking at DO-178B, there are certain distinctions being made with
>> regards to unreachable code, in my understanding (and twiddling with the
>> definitions to avoid confusion with MISRA), unreachable code can be
>> roughly classified as:
>>
>> 1) deactivated - may or must be present in the project but must not be
>> executed under the given configuration
>> 2) required in source - traceable to a system or software requirement,
>> can be optimised away by the compiler
>> 3) required in executable - traceable to a system or software
>> requirement, must not be optimised away by the compiler
>> 4) erroneous - unreachable due to a design or programming error
>>
>> I think that your first three examples would fall under 1 and the last
>> under 2.
>
> That sounds about right.
>
>> However since this categorisation is based on intentions, a tool cannot
>> do much beyond highlighting the unreachable code. It would then be up to
>> the user to decide which action to take. In a MISRA context I believe
>> the first two categories would lead to a deviation and the last two to a
>> correction to the source code.
>
> That would be rather ironic given that removing the test in
> the last example would result in strictly less safe code (and
> go against best practices).

Example 4 would be category 2 so it would be worthy of a deviation.

> The biggest problem I have with the rule is that it implies that
> any code that can lead to constant propagation and subsequently
> code elimination is non-compliant.  That's potentially huge
> amounts of code either having to change for the worse or
> requiring a deviation justifying what could in many cases be
> perfectly reasonable code For instance the following defensively-
> coded definition of abs() is non-compliant if it appears in a code
> base that never calls it with an INT_MIN operand, such as:
>
~~~~ Example 5 ~~~~
>   int abs (int i)
>   {
>     if (i == INT_MIN)
>       {
>         error ("abs undefined with INT_MIN operand");
>         exit (1);
>       }
>
>       return i < 0 ? -i : i;
>   }
>
> Ditto for any other similarly written function.  That's clearly
> the opposite of what we should be aiming for.  We want rules to
> encourage writing safe programs and tools to be able to help
> find bugs in them and to turn the correct ones into optimal
> object code.
>
> I would be more than surprised if it was the intent behind
> the rule to make functions like the abs() above non-compliant
> so I assume the rule is just poorly specified.

Example 5 would also be category 2.

Presumably abs is extern so the invariant arises from the system.

Here is a reworked example where the invariant arises from within the
function, i.e. no matter what the rest of the system does:

~~~~ Example 6 ~~~~

  int abs (int j)
  {
    int i = j / 2;

    if (i == INT_MIN)
      {
        error ("abs undefined with INT_MIN operand");
        exit (1);
      }

      return i < 0 ? -i : i;
  }

This comes across more as category 4 and not worthy of a deviation.

So one possibility could be to reduce the scope of the rule from system
to translation unit or even further to function.

But then some category 4 cases at system scope would be missed, e.g.

~~~~ Example 7 ~~~~

    int f (int prime_number)
    {
        int result;
        if (prime_number > 10 and prime_number % 2 == 0)
            result = g1 (prime_number);
        else
           result = g2 (prime_number);
        return result;
    }

>>> As in some of the other rules we talked about, a rule like 14.3
>>> would be of value if, instead of trying to enforce an arbitrary
>>> coding style, it focused on the problem it tries to prevent.
>>
>> I think that to get a tool to be silent on 1 & 2 and report 3 & 4 it
>> would be necessary for the user to tell the tool what the intention is,
>> and that would have to be done through tool configuration or source code
>> annotation. Going that way could be a possibility, but some work on the
>> user side would still be required, one way or another.
>
> The source of the problem in 14.3 is the use of the term
> /invariant/.  It's probably there (as opposed to "constant
> expression/) because the authors wanted the rule to apply more
> broadly.  But they didn't consider the implications of using
> the term very carefully, and ended up with a rule that, if
> implemented according to the spec, would drown users in noise.
> (I'm curious how many MISRA analyzers would actually diagnose
> the function above under the conditions I mentioned.)
If the scope of the rule was reduced to only cover constant expressions,
then I think that quite a few category 4 cases would be missed, like the
one at:
https://www.misra.org.uk/forum/viewtopic.php?t=1502

>
> Martin
>
>
> _______________________________________________
> C-safe-secure-studygroup mailing list
> C-safe-secure-studygroup at lists.trustable.io
> https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-secure-studygroup
>
---------------------------------------------------------------------------------------
 This email has been scanned for email related threats and delivered safely by Mimecast.
 For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.trustable.io/pipermail/c-safe-secure-studygroup/attachments/20180531/5ba2ef64/attachment-0001.html>