[C-safe-secure-studygroup] reframing rule 11.6 to focus on undefined behavior

Sat Oct 6 18:26:24 BST 2018

On this week's call we again discussed the proposal to adopt
language into the TS to let analyzer users request to have
only provable but not merely suspected instances of undefined
behavior diagnosed.  (This was posted on July 10, 2018 in:
analyzer requirements for false positives).  I agreed to post
as an example rule 11.6 recast with this in mind.

First, here's the meat of the existing Rule 11.6:

   11.6 - A cast shall not be performed between pointer to void
           and an arithmetic type

   Rationale
   Conversion of an integer into a pointer to void may result
   in a pointer that is not correctly aligned, resulting in
   undefined behaviour.

   Conversion of a pointer to void into an integer may produce
   a value that cannot be represented in the chosen integer type
   resulting in undefined behaviour.

   Conversion between any non-integer arithmetic type and pointer
   to void is undefined.

Note the use of the word /may/ in the first two paragraphs: /may
result/ and /may produce/.  I.e., violating the rule does not
necessarily result in undefined behavior.

The obvious problem with 11.6, as with many other similarly
formulated rules, is that it disregards all the valid and
safe uses cases in favor of simplicity of specification.  That
inevitably leads to false positives (if we use the common sense
definition of the term).

For example:

   1) Conversions to intptr_t that don't convert the result back
      to a pointer type.  We discussed a number of use cases for
      this in wide-spread practice, such as pointer hashing.

   2) Conversions to intptr_t and back that preserve the pointer
      value of the converted value.  For example:

      extern int *p;
      uintptr_t i = (uintptr_t)p;   // safe by definition
      p = (int*)i;                  // ditto

   3) Conversions to _Bool which are well-defined.

Rather than banishing the safe/well-defined uses above, a more
effective rule would focus on detecting the problems that either
provably result in undefined behavior, or that cannot be proven
not to result in it.  Distinguishing between these two categories
is the intent of the proposal.

Besides the above, I note that by using the term "cast" the rule
excludes implicit conversions from the set of diagnosable cases.
For example:

   extern void *p;
   int i = p;         // compliant (no cast here)
   ++i;
   p = i;             // undefined yet compliant
   i = *(int*)p;      // ditto

This is almost certainly unintended and simply the result of
sloppy wording, but it needs to be corrected if the rule is
to impose meaningful binding requirements on analyzers.

With that as the background, I would reformulate 11.6 like so:

   A pointer to void shall not be converted to an arithmetic type
   or vice versa if it is undefined, or if using the result is
   undefined.

(The term /converted/ has a precise meaning in C and includes
both implicit and explicit conversions.)

Examples (each function should be viewed as the whole program
visible to the analyzer):

   void f0 (void *p)
   {
     /* Compliant - implementation-defined */
     p = (void *) 0x1234u;
   }

   void f1 (void *p)
   {
     /* Non-compliant - undefined */
     p = (void *) 1024.0f;
   }

   void f2 (void *p, uint32_t u)
   {
     /* Compliant - implementation-defined.
        Non-compliant only if uint32_t cannot represent all values
        of p, i.e., in practice, only on LP64 systems but not on
        common ILP32 hardware.   */
     u = (uint32_t) p;
   }

   void f3 (void *p, uint32_t u)
   {
     p = &u;

     /* Compliant - well-defined */
     uintptr_t ip = (uintptr_t) p;

     /* Compliant - well-defined */
     p = (void*) ip;

     /* Compliant - well-defined */
     char *q = (void*) ip;
     char c = *q;

     /* Compliant - implementation defined.
        Non-compliant only on systems where converting the address
        of an uintptr_t object to double* is undefined (popular
        hardware makes such conversions well-defined).
        (It would be non-compliant if *r were actually accessed.)  */
     double *r = (void *)ip;
   }

   void f4 (void *p, uint32_t u)
   {
     uintptr_t ip = (uintptr_t)p;

     /* The conversion alone is fine but the subsequent undefined
        use of *r makes it non-compliant.  */
     double *r = (void *)ip;
     double x = *r;
   }

   void f5 (void *p, int *q)
   {
     /* Compliant.  */
     uintptr_t ip = (uintptr_t)p;
     *q = (void*)ip;
   }

   int f6 (intptr_t ip)
   {
     /* Undecidable: may or may not be undefined depending on
        the the value of ip and the target architecture.   */
     int *p = (void*)ip;

     /* Also undecidable: may be undefined on strictly aligned
        architectures like SPARC but well-defined on common
        targets like POWER or x86).
     return *p;
   }

Example f6 is interesting for two reasons:

1) Because whether or not it's safe depends on the target system.
I believe a quality tool should let users configure it with
the parameters of the target system(s) and issue diagnostics only
for instances where the violation truly is undefined (as opposed
to implementation-defined).

2) When targeting a strictly aligned system like SPARC and the tool
doesn't know whether the integer is suitably aligned, my proposal
is to require analyzers to indicate in the text of the diagnostic
that the result may or may not be undefined.  I would expect
quality analyzers to make it possible to control on a per-rule
basis whether to diagnose all cases or just those that provably
lead to undefined behavior.

Martin