[cfe-dev] Static Analyzer: pointer alias assignment

Mon Jul 29 05:54:43 PDT 2013

> *It can be done in some cases.  Take a look at DereferenceChecker.cpp
when it generates an error report.  There it walks up the ExplodedGraph to
determine what variable/array/field that a null pointer value got loaded
from.  This is used by the diagnostic machinery.

*
I guess you mean bugreporter::trackNullOrUndefValue. I've been looking at
its source code, but I don't yet even understand what it's meant to do, let
alone how it works. I'll work on this though, thanks!

> *I think the terminology is confusing. SVals don’t alias each other.
 That concept doesn’t even apply here.

*
You're correct, of course. I still get confused around what is an SVal and
what is a MemRegion - even after e.g. Jordan's explanation.
For example, when I have a Foo* fp, and 'fp' appears somewhere (e.g. in a
null-checking condition), then:
 - I get an SVal that says '&fp'
 - I can .getAsRegion() to get a MemRegion that dumps to 'fp'
 - I can then ProgramState::getSVal() on this MemRegion to get e.g.
'&SymRegion{conj_$3{struct Foo *}}'

As far as I can tell, this last in the 'value' of the fp variable, as in,
it is what was last bound to it; then '&fp' is the memory region in which
the value of the variable is stored, and finally 'fp' is pointer value to
the previous region. Is this roughly correct?

> *Currently the analyzer doesn’t reason about the condition accurately
because ‘p’ and ‘q’ are assumed to essentially not alias because they will
refer to two different symbolic pieces of memory.  To solve this problem we
would need the ability to “unify” two memory regions along a path.  That’s
a complicated problem, and nobody has gotten to implementing it yet.*

Out of curiosity, conceptually, what makes this complicated? I realize you
must hate this question, and it will probably be evident once I start
studying how the Static Analyzer is implemented, but I just couldn't
refrain from asking. You can ignore me here. :)

> *This is essentially an “all paths” problem, and we have a few checkers,
such as the IdempotentOperations checker, which try and address this kind
of problem.*
*Roughly speaking, the checker has to be implemented in two stages: [...]

*
Thank you very much, I implemented your approach, and it worked! It was
quite simple - following your idea -, and I'm actually quite embarrassed
that I had to ask for help on this. I was thinking along the lines of
traversing the ExplodedGraph in checkEndAnalysis, which I suppose would
have worked as well, except it would have been way more complicated.

Once again, thank you very much for your help!

Gabor

2013/7/24 Ted Kremenek <kremenek at apple.com>

> On Jul 24, 2013, at 7:36 AM, Gábor Kozár <kozargabor at gmail.com> wrote:
>
> Yes, I realize the Static Analyzer will recognize that the line p = q
> means that 'p' and 'q' now has the same value, but it does not copy the
> user state (REGISTER_*_WITH_PROGRAMSTATE data) - which makes total sense,
> since it does not know how that data is structured. Therefore, I need to
> record the fact the aliasing has happened using checkBind. This is where I
> encountered the problem described in my original e-mail.
>
> Reading through the checker dev manual<http://clang-analyzer.llvm.org/checker_dev_manual.html>,
> I found the relevant section:
> *"When x is evaluated, we first construct an SVal that represents the
> lvalue of x, in this case it is an SVal that references the MemRegion for
> x. Afterwards, when we do the lvalue-to-rvalue conversion, we get a new
> SVal, which references the value currently bound to x. That value is
> symbolic; it's whatever x was bound to at the start of the function."
>
> *
> I need the reverse: not the value currently bound to x, but the SVal that
> references the MemRegion for x itself. I searched the documentation, but
> all I could find was ProgramState::getLValue, none of whose overloads give
> me what I need. So how could I achieve this?
>
>
> It can be done in some cases.  Take a look at DereferenceChecker.cpp when
> it generates an error report.  There it walks up the ExplodedGraph to
> determine what variable/array/field that a null pointer value got loaded
> from.  This is used by the diagnostic machinery.
>
>
> Back to the aliasing problem: I think it would be very useful if the
> Static Analyzer exposes functions to retrieve aliasing information. For
> example, I would like to ask whether a given SVal is an alias of another.
>
>
> I think the terminology is confusing. SVals don’t alias each other.  That
> concept doesn’t even apply here.
>
> For example, suppose I had the expression:
>
>    x + y
>
> and x and y were each bound to the value 1.  When the analyzer evaluates
> “x” and “y” separately we get an SVal of 1 for each subexpressions.  Those
> SVals are the same value, but they don’t alias each other.  Aliasing
> doesn’t even make sense here.
>
> In the end, SVals are just values.  They can wrap different kinds of
> values, be it references to symbolic pieces of memory, actual constants
> such as 1 and 2, addresses of goto labels, and so on.
>
> What you want to know is if two pointer variables, say “p” and “q”, point
> to the same piece of memory.  In the static analyzer they will be bound to
> a value which represents their respective pointer values.  If “p” and “q”
> refer to MemRegions that are completely disjoint (say two separate
> VarRegions) then they cannot alias each other.  If they both refer to two
> SymbolicRegions than they *may* alias each other.
>
> Right now most of the analyzer assumes that two SymbolicRegions always
> refer to separate chunks of memory.  That’s an optimistic assumption (that
> a compiler could never make for optimization), but it works well in
> practice.  There are opportunities to improve the analyzer here.  Suppose
> we saw something like:
>
>   int *p = foo();
>   int *q = bar();
>
>   if (p == q) { … }
>
> Currently the analyzer doesn’t reason about the condition accurately
> because ‘p’ and ‘q’ are assumed to essentially not alias because they will
> refer to two different symbolic pieces of memory.  To solve this problem we
> would need the ability to “unify” two memory regions along a path.  That’s
> a complicated problem, and nobody has gotten to implementing it yet.
>
> But I think you aren’t concerned about this problem.  I think you are more
> thinking of the following scenario:
>
>   int *p = foo();
>   ...
>   int *q = p;
>   …
>
> In this case, ‘p’ and ‘q’ alias.  Given two variables, it’s easy to
> determine if they currently alias if the resolve to the same MemRegion.
>  You can even chop off region offsets if you want to make things more
> course grained.
>
> But my point is that for your checker this isn’t relevant.  The analyzer
> already reasons about all of this for you.  Your checker doesn’t really
> care about ‘p’ or ‘q’, but rather what it points to.  For example, suppose
> you had:
>
>   int *p = foo();
>   …
>   int *r = p;
>   …
>   int *q = r;
>   …
>   *r = 1;
>   …
>   if (q) { … }
>
> In this case, we have ‘p’, ‘q’, and ‘r’ all aliasing each other.  The
> analyzer engine tracks all of this for you without you doing anything.
>  What’s important here is that the pointer value that is loaded from ‘q’ at
> the if statement is perfectly constrained to be non-null since it was
> already dereferenced earlier.  That’s all that matters.  The aliasing is
> irrelevant.
>
> My original thought was that this alias-analysis could be implemented as a
> checker itself, which would dispatch an event when it detects an aliasing,
> but I do not see how it could expose methods to the other checkers (e.g.
> for them to be able to ask "is x an alias of y in this ProgramState?").
> Does this make sense?
>
>
> I really don’t see how aliasing is relevant.
>
>
> > *A null check might not always be redundant.  On some paths the null
> check may be redundant and others it won’t be.  Thus there is a dominance
> relationship here that needs to be checked.  Essentially, all paths need to
> show that the pointer is always non-null before the pointer is checked.
>  Otherwise you’ll get false positives.  Doing this correctly is hard,
> because not all paths are guaranteed to be traced.  You’ll need to handle
> that too.
>
> *
> Yes, this is a problem, but I have absolutely no clue as to how it could
> be solved. I would naturally want to keep the number of false-positive at
> minimum, even at the cost of some bugs going undetected.
>
>
> It’s a real problem, and unless you have a solution it probably makes the
> checker unusable.
>
> This is essentially an “all paths” problem, and we have a few checkers,
> such as the IdempotentOperations checker, which try and address this kind
> of problem.
>
> Roughly speaking, the checker has to be implemented in two stages:
>
> (1) Keep on the side a map from “condition checks” to a tri-state: {
> always null, may-be-null, no value }.  The “no value” is the default, and
> it essentially means you have no data for a given condition.
>
> For example, suppose you had:
>
>   if ( pointer value )
>   …
>   if ( pointer value )
>
> You would have a side-map mapping from each of these IfStmts to the
> tri-state.
>
> (2) Monitor checks of the pointer value using the analyzer visitor
> interface.  If the pointer value is null and the current map value not
> “may-be-null”, mark it “null”.  Otherwise, mark it “may-be-null”, which
> essentially means that the given condition statement is no longer in
> contention for the warning.
>
> (3) After the analyzer finishes exploring paths, go over this map and find
> the entries that are marked “always null”.  Those are the places to emit a
> warning.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130729/6e21d79c/attachment.html>