[cfe-dev] [analyzer] Summary IPA thoughts

Thu Mar 31 07:52:25 PDT 2016

On 31.03.2016 15:54, Gábor Horváth wrote:
> I think that timestamps might not be a general solution to this 
> problem, see the next inline response. But as a side not, do you plan 
> to use timestamps to use this problem:
>
>   void bar(int *b) {
>     foo(b); // [1] calling foo()
>   }
>   void foo(int *a) {
>     *a = 0;
>     if (a == 0) {
>       // [2] warn: null-dereference
>     }
>   }
Oh, the check-after-use for delayed positives. We did not fix this yet 
because we didn't agree on how much this is really unwanted. In fact, 
for most kinds of bugs we'd like to have some kind of check-after-use 
checks, but we should also be able to turn it off. The solution we're 
having in mind is to consider the program state in the 
delayed-check-node to see if the bug gets reproduced inside it (we 
remember the whole node anyway for bug reporter purposes), though we 
didn't think really deeply about it yet, so it may fail to work. The 
current code produces such positives.
>
>     On the other hand, i don't think that there are other places that
>     explicitly rely on assuming different symbols to represent
>     *different* values. It's only true for the store, which expects
>     different base-regions to non-overlap, even when the regions are
>     symbolic. Say, the constraint manager, upon encountering "if (a ==
>     b)", should split the state (we did enable SymSymExpr for this to
>     work), rather than produce always-false.
>
>
> How did you achieve this? For example:
>
>    int a;
>    void foo(int *x, int *y) {
>       int a;
>       int *b = &a;
> if (x == y)  { } // I expect this to split the state.
>       if (b == &a) {} // I expect this not to split the state
>    }
>    void bar() {
>      foo(&a, &a);
>    }
>
> Does this work as expected?
Yeah, i guess it works pretty much as expected. In the out-of-context 
analysis, we have two branches for foo(), depending on range constraints 
for symbol 'reg_$0<x> == reg_$0<y>' - in one branch it's [0, 0], in the 
other branch it's [1, something like BOOL_MAX]. When calling foo(&a, &a) 
from bar(), reg_$0<x> gets renamed to &a (as integer), reg_$0<y> gets 
renamed to &a (as integer) (i don't remember if we still keep the 
SymbolRegionAddress thing around, because i wanted to replace it with 
the correct nonloc::LocAsInteger eventually, as it seems more correct; 
it is more difficult to work with, but imho it is "the" correct 
actualization), and finally the whole symbol gets renamed to '&a (as 
integer) != &a (as integer)' and collapses to 0 during evalBinOp, but 
because range [0, 0] of 0 does not intersect with [1, something like 
BOOL_MAX], this branch is discarded as unreachable. So when applying 
summary in-context we have only one branch.
>
>     And then actualization can take care of the rest. Say the checker,
>     upon encountering, say, "fclose(f1); fclose(f2);" and then later,
>     when applying the summary, finding out that f1 and f2 is the same
>     descriptor, would detect the problem upon actualization of the
>     respective symbols to the same symbol. Or the store would
>     correctly merge bindings to different sub-regions of aliased
>     regions even if they were not aliased outside the context.
>
>
> I think I would prefer a way to do this automatically instead of 
> making checker writers consider aliasing explicitly. But I may 
> misunderstood your point.
Yeah, me too. But as long as the GDM is opaque for the analyzer core, we 
cannot avoid relying on checkers for managing their info in the state. 
Hence the recent buzzword in our discussions - the "smart data map", 
which i'm dreaming of, but could not work on it yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160331/0e830379/attachment.html>