[C-safe-secure-studygroup] what constitutes a rule violation?

Mon Jul 16 10:31:25 BST 2018

Hi Martin

It may not look like it, but I think we are moving to common ground. One
difference between this group and MISRA is that whilst MISRA is written
with awareness that its rules need to be machine checkable, it's not
explicitly the definition of an analysis tool - which is what this group
says its striving for. How you handle code for which you don't have the
source may be an area where this group can 'add value'

My personal preference is for annotations in the source code for design
decisions - like [[fallthrough]] for a switch clause without break;  -
otherwise I prefer data in a file outside the code for say the behaviour
for library functions e.g. our tool allows you to say whether a
pointer/reference parameter of a library function is: readonly (can be
inferred if its const), write only, or read-write (implies that the source
should be initialised before use). This doesn't really work for the most
critical systems, because there you need the evidence that the functions
behave as you say - but for most systems it should be acceptable

    Clive

On Thu, Jul 12, 2018 at 5:55 PM, Martin Sebor <msebor at gmail.com> wrote:

> On 07/12/2018 03:34 AM, Clive Pygott wrote:
>
>> From my point of view, I'd expect all three example to report a
>> diagnostic of 'use of uninitialised variable'
>>
>
> Thanks for taking the time to answer the question is such
> detail!
>
>
>> The first example is a no-brainer, i cannot get initialised (except by
>> some exploitation of undefined behaviour - like f writing to some
>> address derived from the return address, that happens to be the location
>> of i).  Or was the code intended to be  f(&i); ?  In which case the same
>> argument as the next two examples apply.
>>
>
> Sorry, the code should have read "f(&i)".  It's meant to represent
> the C idiom of initializing local variables, such as by strtol():
>
>   char *end;
>   long x = strtol (s, &e, 10);
>   if (*end)
>     return -1;
>   return x;
>
>
>> In the second example, if the address of i were assigned to  int *p  (no
>> extern), then it certainly deserves the diagnostic, as i is still
>> uninitialised.
>>
>
> If the function is defined like so:
>
>   int f (int *x)
>   {
>      if (x)
>          *x = 0;
>      *p = 0;
>      return 0;
>   }
>
> then passing the address of the i local to main has the effect
> of initializing it and a diagnostic would be considered a false
> positive.
>
> Does MISRA really expect a diagnostic here if the analyzer
> doesn't know what the definition of f() looks like?  Are
> engineers coding to MISRA really expected and willing to accept
> such diagnostics and put effort into going through the deviation
> process for each instance of what is a pervasive idiom?
>
> I don't think TS 17961 does require a diagnostic here but I'm
> not sure that it disallows one either.  It simply doesn't say
> (because it applies only to analyses with access to the source
> of whole programs) which I think is a serious oversight.
>
> In my world, no compiler issues a diagnostic for such code even
> though they easily could.  They don't because a call to any
> function whose definition isn't known is assumed to overwrite
> the value of all variables that can be overwritten (including
> locals whose address has escaped).  Doing otherwise -- issuing
> a warning -- would lead to countless false positives and would
> simply not be accepted by engineers.  Diagnostics like that
> would be turned off or ignored.
>
> With the extern,  if  p  is in the same library as f
>> (source not visible) and you know from the documentation  that  f
>> always assigns a value to the object pointed to by  p, then you in
>> theory still need the diagnostic but you have an excellent
>> justification for a deviation - in MISRA terminology.
>>
>
> I find it hard to accept that the MISRA process requires
> the heavy and expensive overhead of the deviation process for
> every instance of what could very well be a false positive, and
> what in case of the idiomatic f(&i) most likely is one.  (Though
> it might help explain the crazy prices of European cars ;)
>
> For me, requiring a diagnostic in this case is out of the question.
> I would, however, find it perfectly reasonable to issue one if f()
> were declared to take a const int* instead, and if p were also
> declared as const int*.  Declaring them const doesn't prevent
> the function from casting the constness away, but doing so would
> be a pretty blatant violation of const correctness.
>
> I say in theory,
>> as I can imagine a way a tool could be told about properties of f by
>> means of metadata, manually generated to describe the claimed behaviour
>> of the library routines, so it might know that  *p  always gets assigned
>> on a call of f. If the source of the translation unit containing p and
>> f  is available, I'd say it was a quality of implementation issue
>> whether the diagnostic is raised or not.
>>
>
> A function that doesn't change its arguments passed to it by
> reference should declare those arguments const (as in const
> int*).  I believe a tool should assume that otherwise,
> the function writes into the argument and should not issue
> a diagnostic (and the standard should not only not require
> one -- it could allow one as long as the analyzer made it
> clear that it's one of those "uncertain" kinds of diagnostics).
>
> On the subject of metadata: For functions that both read and
> write such arguments (i.e., for in-out arguments) I think we
> should recommend/specify an annotation such as an attribute
> to let users avoid false negatives.  For instance:
>
>   // function both reads and writes *x so passing
>   // addresses of uninitialized locals to it should
>   // be diagnosed
>   int f ([[read_write]] int *x)
>   {
>     return ++*x;
>   }
>
>
>> Similarly, for the third example, its not undefined behaviour if  f
>> always returns a value in the range -1..2 - so like in the second
>> example, I'd expect the violation to be raised unless there's metadata
>> to describe the behaviour of f or the code for f is visible - with the
>> possibility of a justification for a deviation.
>>
>
> Requiring a diagnostic here is also unacceptable in my work.
> Functions commonly return a subset of values of their type.
> For instance, most of the C stdio functions return either EOF
> or a positive value.  Diagnosing every use that doesn't handle
> values they cannot return (i.e., negative values other than
> EOF) would be pointless and drown the user in false positives,
> as would diagnosing user-defined functions that have the same
> property.
>
> Here again I think we could help by specifying an annotation
> that makes known to the analyzer the subset of possible values
> returned by a function.  E.g.,
>
>   [[range (EOF), range (0, INT_MAX)]] int f (int*);
>
> The union of the ranges specified by the attributes denotes
> the set of values the function can return.  With that
> annotation, it might be be reasonable to issue a diagnostic
> if a caller didn't handle all possible values (another
> annotation might be needed to correlate the return value
> to the value of an argument of the function to avoid other
> false positives.)
>
> In any event, the standard needs clearly specify what conforming
> analyzers are supposed to do in these basic cases.  Not having
> access to the source of the entire program is the common case.
>
> I think we can only make progress if we settle these basics.
>
> Martin
>
>
>>     Clive
>>
>> On Wed, Jul 11, 2018 at 8:22 PM, Martin Sebor <msebor at gmail.com
>> <mailto:msebor at gmail.com>> wrote:
>>
>>     I get the impression that we're spending large amounts of time
>>     with little progress by rehashing the same issue because we're
>>     not on the same page about a fundamental question: what
>>     constitutes a rule violation.
>>
>>     Take the following examples.  Assume what you see is the complete
>>     program that's visible to the analyzer; what's not defined in it
>>     is in some library that the analyzer doesn't see or have knowledge
>>     of -- like in the vast majority of software.
>>
>>       extern int f (int*);
>>
>>       int main (void)
>>       {
>>         int i;   // uninitialized
>>         f (&);
>>         return i;
>>       }
>>
>>     Is it anyone's expectation that the read of i in main violate
>>     MISRA Rule 9.1 or TS 17961 rule 5.35 (reading uninitialized
>>     memory), and so requires a diagnostic?
>>
>>     What about if I change main to
>>
>>       int main (void)
>>       {
>>         int i;
>>         extern int *p = &i;
>>         f (0);
>>         return i;
>>       }
>>
>>     Does this require a diagnostic?  Should it?
>>
>>     Or:
>>
>>       int main (void)
>>       {
>>         int i;
>>         switch (f (0)) {
>>         case 0:
>>         case 1:
>>         case 2: i = 0; break;
>>         case -1: i = 1; break;
>>         }
>>         return i;
>>       }
>>
>>     Does this require a diagnostic?
>>
>>     What do the analyzers that implement this rule do?  Does
>>     it change from rule to rule?  Should it be allowed to?
>>
>>     I don't believe we can make forward progress unless we are
>>     clear on what the answers are or what we want them to be.
>>     I don't think the spec will have value to users unless it
>>     makes it clear for each rule what constitutes a violation
>>     and whether it's considered a certain violation or just
>>     a possible one.
>>
>>     Martin
>>
>>     _______________________________________________
>>     C-safe-secure-studygroup mailing list
>>     C-safe-secure-studygroup at lists.trustable.io
>>     <mailto:C-safe-secure-studygroup at lists.trustable.io>
>>     https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-
>> secure-studygroup
>>     <https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-
>> secure-studygroup>
>>
>>
>>
>>
>> _______________________________________________
>> C-safe-secure-studygroup mailing list
>> C-safe-secure-studygroup at lists.trustable.io
>> https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-
>> secure-studygroup
>>
>>
>
> _______________________________________________
> C-safe-secure-studygroup mailing list
> C-safe-secure-studygroup at lists.trustable.io
> https://lists.trustable.io/cgi-bin/mailman/listinfo/c-safe-
> secure-studygroup
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.trustable.io/pipermail/c-safe-secure-studygroup/attachments/20180716/a1b43482/attachment.html>