[C-safe-secure-studygroup] On MISRA C:2012 Rule 14.3 – "Controlling expressions shall not be invariant"

Wed May 30 12:58:48 BST 2018

On 30/05/18 03:50, Martin Sebor wrote:
> On 05/29/2018 06:55 AM, Fulvio Baccaglini wrote:
>> Hi,
>>
>> This is my current personal understanding & initial thoughts, to be
>> subject to review.
>>
>> MISRA C:2012 Rule 14.3 – "Controlling expressions shall not be
>> invariant"
>>
>> Required - Undecidable - System
>>
>> Exceptions:
>> (1) infinite loops (expression always evaluates to true)
>> (2) single iteration do while loop (expression always evaluates to
>> false)
>>
>> MISRA provides two reasons for this rule:
>> (A) symptom of a programming error
>> (B) compiler removing defensive code
>>
>> An example of case (B) could be software trying to ensure that some
>> critical object has not been corrupted, before using it.
>>
>> Corruption could occur through hardware failure but also through
>> undefined behaviour arising in some other logically unrelated part of
>> the program. Due to undecidability, it is not guaranteed that a tool may
>> detect and report that undefined behaviour.
>
> In general it's not possible for a higher layer abstraction to
> anticipate and recover from failures of a lower layer, or even
> the same layer.  This includes compilers trying to emit code to
> detect, let alone recover from, hardware failures, or programmers
> trying to write code that recovers from compiler bugs or other
> kinds of undefined behavior.  Likewise, short of employing full
> redundancy, it's simply not possible to write code that recovers
> from its own bugs.
>
> Removing code that's unreachable due to program invariants is
> one of the basic transformations every optimizing compiler does
> and all programmers rely on to build efficient software.  A rule
> designed to defeat that optimization would be entirely counter-
> productive.  The only way to come close would be to make all
> variables atomic and all accesses volatile.
>
>>
>> In C99 controlling expressions occur in: if, switch, while, do while,
>> for, ?:
>>
>> The possibilities are:
>>
>> * if (always_true ())  ==> the check is dead code
>> * if (always_false ()) ==> the check is dead code and the body is
>> unreachable code
>> * switch (always_the_same ()) ==> the check is dead code, some case
>> clauses are unreachable code
>> * while (always_true ()) ==> infinite loop or loop with breaks in the
>> body
>> * while (always_false ()) ==> the check is dead code and the body is
>> unreachable code
>> * do while (always_true ()) ==> infinite loop or loop with breaks in the
>> body
>> * do while (always_false ()) ==> the check is dead code, but enforces
>> semicolon after macro invocation
>> * for (...; always_true (), ...) ==> infinite loop or loop with breaks
>> in the body
>> * for (...; always_false (), ...) ==> the check is dead code and the
>> body is unreachable code
>> * always_true () ? x : y ==> the check is dead code and the 3rd operand
>> is unreachable code
>> * always_false () ? x : y ==> the check is dead code and the 2nd operand
>> is unreachable code
>>
>> In summary, invariant expressions are associated with:
>>
>> * Dead code
>> * Unreachable code
>> * Infinite loops
>> * Loops with breaks in the body
>> * Enforcing semicolon after macro invocation
>>
>> Dead code and unreachable code relate to (A) symptom of a programming
>> error - why has the developer written software that is unnecessary? why
>> is the software trying to deal with a situation that cannot arise?
>
> There are situations where an invariant controlling expression
> is a mistake.  There are also common implementation and design
> techniques where using even integer constant expressions in
> these contexts is intended.  In all contexts, short of disabling
> optimization, it's outside programmers' control where invariants
> are determined and their value propagated across statement,
> function, or even translation unit boundaries by the compiler
> and used to eliminate unreachable code.
>
> A simple example is code that verifies, either statically or
> dynamically, global preconditions about the host environment:
>
>   if (sizeof (int) != 4
>       || sizeof (long) != 4
>       || sizeof (void*) != 4)
>   {
>     fprintf (stderr, "only ILP32 host environment supported");
>     exit (1);
>   }
>
> Another example is software that can be configured at compile
> time to target different architectures often has tests of
> the form:
>
>   if (ARCH_SUPPORTS_FEATURE_FOO_FOR (x)) {
>     // use feature foo with x
>   }
>
> For some architectures, ARCH_SUPPORTS_FEATURE_FOO_FOR() macro
> might expand to 0/false.  On others, it might expand to 1/true.
> For others still, to some non-trivial query whose result depends
> on the value of its argument.
>
> A more modern example of a similar approach is an object-oriented
> design of an interface whose base implementation is roughly
> equivalent to the above:
>
>   if (arch->supports_feature_foo_for (x)) {
>     // ...
>   }
>
> where the "base implementation" of supports_feature_foo_for()
> for some bare-bones minimal architecture returns false and that
> implementations for more advanced architectures have the option
> to "override" to return true.
>
> An entirely different example that doesn't rely on constant
> expressions is the following:
>
>   FILE* open_temp_file (const char *dir, int n)
>   {
>     char pathname[256];
>     int n = snprintf (pathname, sizeof pathname,
>                       "%s/%u.tmp", dir, n);
>     if (n > sizeof buf)
>       return 0;
>     return fopen (pathname, "w");
>   }
>
> Since the rule is system-wide, if the longest string in a program
> the function is called with isn't long enough for snprintf to
> truncate the output the rule effectively forces the programmer
> to avoid testing the snprintf result (and possibly even replace
> snprintf with sprintf).
>
> Similar to other rules we have discussed, 14.3 is overly
> simplistic, mistakes correlation for causation, and results
> in invalidating common and safe coding practices.  Worse, it
> can also force programmers to replace good code with unsafe
> alternatives.

Looking at DO-178B, there are certain distinctions being made with
regards to unreachable code, in my understanding (and twiddling with the
definitions to avoid confusion with MISRA), unreachable code can be
roughly classified as:

1) deactivated - may or must be present in the project but must not be
executed under the given configuration
2) required in source - traceable to a system or software requirement,
can be optimised away by the compiler
3) required in executable - traceable to a system or software
requirement, must not be optimised away by the compiler
4) erroneous - unreachable due to a design or programming error

I think that your first three examples would fall under 1 and the last
under 2.

However since this categorisation is based on intentions, a tool cannot
do much beyond highlighting the unreachable code. It would then be up to
the user to decide which action to take. In a MISRA context I believe
the first two categories would lead to a deviation and the last two to a
correction to the source code.

> As in some of the other rules we talked about, a rule like 14.3
> would be of value if, instead of trying to enforce an arbitrary
> coding style, it focused on the problem it tries to prevent.

I think that to get a tool to be silent on 1 & 2 and report 3 & 4 it
would be necessary for the user to tell the tool what the intention is,
and that would have to be done through tool configuration or source code
annotation. Going that way could be a possibility, but some work on the
user side would still be required, one way or another.

>
> Martin
>
>> A different degree of severity of this symptom may be considered,
>> depending on whether the invariant arises within a function,
>> irrespective of what the rest of the system does, or within a
>> translation unit, or is dependent on the whole system analysis. Rule
>> 14.3 is based the latest approach.
>>
>> Other rules also deal with dead and unreachable code, which can be
>> present even if invariant expressions are not involved:
>>
>> Rule 2.1 - "A project shall not contain unreachable code"
>> Rule 2.2 - "There shall be no dead code"
>>
>> Within the context of Rule 14.3, dead code would correspond to the
>> unnecessary evaluation of controlling expressions (the definition of
>> dead code as "any operation that is executed but whose removal would not
>> affect program behaviour" may be somewhat open to interpretation).
>>
>> The evaluation is unnecessary for the purpose of determining which path
>> of execution the program is going to take, however if the invariant
>> controlling expression has persistent side effects, then it is not dead
>> code.
>>
>> Even with persistent side effects, the underlying issue remains, that
>> the use of an invariant expression as a controlling expression is
>> unnecessary and therefore suspicious.
>>
>> An additional rule of the type "controlling expressions shall not have
>> persistent side effects" may help separate the two concerns, but may
>> come across as being too strict.
>>
>> Complying with Rule 16.6 "Every switch statement shall have at least two
>> switch-clauses" implies that invariant controlling expression in switch
>> statements are always associated with unreachable code (or dead code if
>> all clauses are empty).
>>
>> An invariant controlling expression in a switch statement is a specific
>> instance of a more general issue of a controlling expression only
>> directing the execution towards some, but not all the case/default
>> paths.
>>
>> Infinite loops and loops with breaks in the body come across
>> respectively as necessary and sensible exceptions.
>>
>> It may be useful to restrict these two exceptions to one syntactic form
>> only:
>>
>> ~~~~~~~~>
>> while (true)
>> {
>>    ...
>> }
>> <~~~~~~~~
>>
>> So that developers would explicitly signal through this form their
>> intention to rely on the invariant as an exception to the rule, while
>> other symptoms of programming errors (e.g. similar to
>> https://www.misra.org.uk/forum/viewtopic.php?t=1502) may still be
>> reported.
>>
>> Likewise the way of enforcing a semicolon after a macro invocation could
>> be restricted, for instance by only allowing "do { ... } while (false)"
>> as the leftmost & rightmost tokens in a macro replacement list.
>>
>> With respect to C99, in C11 the _Generic construct is introduced.
>>
>> In this example:
>>
>> ~~~~>
>> extern void f1 (int x);
>> extern void f2 (char * x);
>>
>> #define F(X) _Generic ((X), int: f1, char *: f2) (X)
>>
>> int main (void)
>> {
>>   // F (42);
>>   F ("A");
>>   return 0;
>> }
>> <~~~~
>>
>> It could be argued that the macro parameter X is also kind of
>> "invariant" in the context of the program, leading to function f1 being
>> unreachable.
>>
>> The _Generic example however involves the preprocessor, and if the
>> preprocessor is also to be covered by this type of analysis, then it
>> could be argued that the same would need to apply to other directives as
>> well:
>>
>> ~~~~>
>> #if A == 0
>>   #define B 1
>>   #if A == 1     // "invariant"
>>     #define B 0  // "unreachable"
>>   #endif
>> #endif
>> <~~~~
>>
>> TS 17961 related rules:
>>
>> * None (?)
>>
>> CERT C 2016 related rules:
>>
>> * MSC06-C - "Beware of compiler optimizations"
>> * MSC12-C - "Detect and remove code that has no effect or is never
>> executed"
>>
>> Fulvio
>>
>>
>> ------------------------------------------------------------------------
>> This email has been scanned for email related threats and delivered
>> safely by Mimecast.
>> For more information please visit http://www.mimecast.com
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> C-safe-secure-studygroup mailing list
>> C-safe-secure-studygroup at lists.trustable.io
>> https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-secure-studygroup
>>
>>
>
>
> _______________________________________________
> C-safe-secure-studygroup mailing list
> C-safe-secure-studygroup at lists.trustable.io
> https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-secure-studygroup
>
---------------------------------------------------------------------------------------
 This email has been scanned for email related threats and delivered safely by Mimecast.
 For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.trustable.io/pipermail/c-safe-secure-studygroup/attachments/20180530/7a2229d3/attachment-0001.html>