[cfe-dev] Handling of loops in the Clang Static Analyzer

Tue Feb 28 08:05:08 PST 2017


On 2/28/17 9:44 AM, Venugopal Raghavan via cfe-dev wrote:
> Hi,
>
> Many thanks for all your replies. I am trying to see how best I can 
> change my checker/or the way I run my checker taking into account your 
> suggestions.
I'd add to this that ideally, the current approach to loops shouldn't 
lead to false positives, because all paths explored this way are still 
valid, assuming everything else works fine. When something else goes 
wrong as well, such as loop condition being improperly modeled (eg. we 
loop from 0 to collection.size() and don't realize that the size is the 
same in two subsequent loops), then it gets worse. But it'd get worse 
without loops for the same reason.

If you're trying to solve an all-path problem and rely on full coverage, 
then you'd have more luck with regular data flow analysis (see 
DeadStoresChecker), or if you still want to use symbolic execution for 
that, you'd need to detect lost coverage in the checkEndAnalysis 
callback (eg. if ExprEngine has work remaining) and suppress all 
warnings obtained during this analysis if the coverage was lost (i don't 
think i have examples, this trick wasn't explored much, but i suspect it 
could work).
> However, I am afraid, I have one more question. Consider the following 
> test case:
>
> test(int q, int z) {
>    int b;
>    int x;
>    int p;
>
>    if (z > 0) {        // S1
>      p = q + 1;
>      x = p + 1;
>    }
>
>    if (x <= (q + 1)) {   // S2
>      b = 0;                 // S3
>    }
>    else {
>      b = 1;
>    }
>    printf("%d\n", b);
> }
>
> If (z > 0) is true in statement S1 above, x gets the value q + 2 
> (propagating through p). In the exploded graph, I do see this being 
> the case. Hence, shouldn't the constaint solver recognize the fact 
> that the condition (x <= (q + 1)) in statement S2 is always false?
Well, technically not, because ((q + 2) <= (q + 1)) is true when q == 
INT_MAX - 1. However, i also suspect that the constraint solver may fail 
to recognize both your and my point: our constraint solver is too simple 
(for the sake of being fast) to simplify even that much (see also the 
recent work on adding Z3 to us - the reviews around 
https://reviews.llvm.org/D28952 ).
> However, in the exploded graph, I do not see this condition and the 
> assignment of 0 to b in S3 getting pruned away when I follow the path 
> in which (z > 0) is true. Can't the constraint solver handle this 
> case? Or have I goofed up in my reasoning with the example (hopefully, 
> my reasoning about the mutual exclusivity of the statements holds true 
> irrespective of the signedness etc.)
>
> Thanks.
>
> Regards,
> Venu.
>
>
> On Mon, Feb 27, 2017 at 4:48 PM, Sean Eveson <eveson.sean at gmail.com 
> <mailto:eveson.sean at gmail.com>> wrote:
>
>     Hi Venugopal,
>
>     > Sean Eveson (cc’d) did some initial work on loop widening to
>     mitigate this problem.
>
>     I started to work on this, but have unfortunately not had time to
>     take the next steps. There is a mode which does 'loop widening'
>     which is off by default as the state after the loop is
>     completely cleared which can lead to false positives. The idea is
>     to improve on the widening, by changing it to only clear state
>     that might be affected by the loop.
>
>     To enable the loop widening pass: "-analyzer-config widen-loops=true".
>
>     >Maybe -analyzer-max-loop is interesting. It tells Clang analyzer
>     how many times you want to go through loops. The bigger value the
>     better analysis, but slower analysis.
>
>     An issue you will run in to when changing analyzer-max-loop is
>     that it also affects variable-bound loops. If you set the max loop
>     higher to work around your problem you will see some slow down for
>     the concrete-bound loops, but you will see a lot more for the
>     variable-bound ones.
>
>     Example:
>     // Where the value of j is unknown.
>     for (i = 0; i < j; ++i) {
>       // ...
>     }
>     foo(i);
>
>     As far as I remember, with a max loop of 1000 and no other limits
>     on the analyzer, `foo` will be called for all the values between
>     0..~1000. This is because the analyzer will branch each time it
>     reaches the loop condition.
>
>     Regards,
>     Sean Eveson
>
>     Sean Eveson
>     SN Systems - Sony Computer Entertainment Group
>
>     On Fri, Feb 24, 2017 at 7:54 PM, Devin Coughlin
>     <dcoughlin at apple.com <mailto:dcoughlin at apple.com>> wrote:
>
>         Hi Venugopal,
>
>         > On Feb 23, 2017, at 7:07 PM, Venugopal Raghavan via cfe-dev
>         <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>         >
>         > Hi,
>         >
>         > I am re-sending the question I asked under a different
>         thread so that the subject is more relevant to the topic.
>         >
>         > I did not quite realize it earlier but it seems that the
>         static analyzer unrolls a loop up to a certain number of times
>         and then stops exploring paths beyond that. In the checker I
>         have written, I get the message "Block count exceeded" and
>         then state exploration stops. As a result, my checker give
>         false positives and does not achieve what it sets out to do.
>
>         The analyzer makes no promises about exhaustivity and in many
>         cases will drop flows and stop path exploration. If your
>         analysis depends on full path exploration to prevent false
>         positives (as compared to false negatives) then it is going to
>         be an uphill battle to eliminate false positives.
>
>         As you noted, for concrete-bound loops stopping path
>         exploration can be particularly pernicious because after
>         unrolling the loop N times the analyze simply stops. Any code
>         dominated by the loop exit will simply not be explored,
>         leading to false negatives.
>
>         Sean Eveson (cc’d) did some initial work on loop widening to
>         mitigate this problem. The basic idea there was rather than
>         simply stopping, the analyzer would “forget” any specific
>         information about a particular iteration through the loop and
>         proceed analyzing after the loop without any assumptions about
>         how many times the loop was unrolled. This would lose some
>         precision but gain coverage for code after the loop. This
>         feature not complete and is off by default, but you can see
>         the beginnings of it at <https://reviews.llvm.org/D12358
>         <https://reviews.llvm.org/D12358>>
>
>         Devin
>
>
>
>
>
> _______________________________________________ cfe-dev mailing list 
> cfe-dev at lists.llvm.org 
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev