[cfe-dev] Static Analyzer: pointer alias assignment

Wed Jul 24 10:57:58 PDT 2013

On Jul 24, 2013, at 7:36 AM, Gábor Kozár <kozargabor at gmail.com> wrote:

> Yes, I realize the Static Analyzer will recognize that the line p = q means that 'p' and 'q' now has the same value, but it does not copy the user state (REGISTER_*_WITH_PROGRAMSTATE data) - which makes total sense, since it does not know how that data is structured. Therefore, I need to record the fact the aliasing has happened using checkBind. This is where I encountered the problem described in my original e-mail.
> 
> Reading through the checker dev manual, I found the relevant section:
> "When x is evaluated, we first construct an SVal that represents the lvalue of x, in this case it is an SVal that references the MemRegion for x. Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, which references the value currently bound to x. That value is symbolic; it's whatever x was bound to at the start of the function."
> 
> I need the reverse: not the value currently bound to x, but the SVal that references the MemRegion for x itself. I searched the documentation, but all I could find was ProgramState::getLValue, none of whose overloads give me what I need. So how could I achieve this?

It can be done in some cases.  Take a look at DereferenceChecker.cpp when it generates an error report.  There it walks up the ExplodedGraph to determine what variable/array/field that a null pointer value got loaded from.  This is used by the diagnostic machinery.

> 
> Back to the aliasing problem: I think it would be very useful if the Static Analyzer exposes functions to retrieve aliasing information. For example, I would like to ask whether a given SVal is an alias of another.

I think the terminology is confusing. SVals don’t alias each other.  That concept doesn’t even apply here.

For example, suppose I had the expression:

   x + y

and x and y were each bound to the value 1.  When the analyzer evaluates “x” and “y” separately we get an SVal of 1 for each subexpressions.  Those SVals are the same value, but they don’t alias each other.  Aliasing doesn’t even make sense here.

In the end, SVals are just values.  They can wrap different kinds of values, be it references to symbolic pieces of memory, actual constants such as 1 and 2, addresses of goto labels, and so on.

What you want to know is if two pointer variables, say “p” and “q”, point to the same piece of memory.  In the static analyzer they will be bound to a value which represents their respective pointer values.  If “p” and “q” refer to MemRegions that are completely disjoint (say two separate VarRegions) then they cannot alias each other.  If they both refer to two SymbolicRegions than they *may* alias each other.

Right now most of the analyzer assumes that two SymbolicRegions always refer to separate chunks of memory.  That’s an optimistic assumption (that a compiler could never make for optimization), but it works well in practice.  There are opportunities to improve the analyzer here.  Suppose we saw something like:

  int *p = foo();
  int *q = bar();

  if (p == q) { … }

Currently the analyzer doesn’t reason about the condition accurately because ‘p’ and ‘q’ are assumed to essentially not alias because they will refer to two different symbolic pieces of memory.  To solve this problem we would need the ability to “unify” two memory regions along a path.  That’s a complicated problem, and nobody has gotten to implementing it yet.

But I think you aren’t concerned about this problem.  I think you are more thinking of the following scenario:

  int *p = foo();
  ...
  int *q = p;
  …

In this case, ‘p’ and ‘q’ alias.  Given two variables, it’s easy to determine if they currently alias if the resolve to the same MemRegion.  You can even chop off region offsets if you want to make things more course grained.

But my point is that for your checker this isn’t relevant.  The analyzer already reasons about all of this for you.  Your checker doesn’t really care about ‘p’ or ‘q’, but rather what it points to.  For example, suppose you had:

  int *p = foo();
  …
  int *r = p;
  …
  int *q = r;
  …
  *r = 1;
  …
  if (q) { … }

In this case, we have ‘p’, ‘q’, and ‘r’ all aliasing each other.  The analyzer engine tracks all of this for you without you doing anything.  What’s important here is that the pointer value that is loaded from ‘q’ at the if statement is perfectly constrained to be non-null since it was already dereferenced earlier.  That’s all that matters.  The aliasing is irrelevant.

> My original thought was that this alias-analysis could be implemented as a checker itself, which would dispatch an event when it detects an aliasing, but I do not see how it could expose methods to the other checkers (e.g. for them to be able to ask "is x an alias of y in this ProgramState?"). Does this make sense?

I really don’t see how aliasing is relevant.

> 
> > A null check might not always be redundant.  On some paths the null check may be redundant and others it won’t be.  Thus there is a dominance relationship here that needs to be checked.  Essentially, all paths need to show that the pointer is always non-null before the pointer is checked.  Otherwise you’ll get false positives.  Doing this correctly is hard, because not all paths are guaranteed to be traced.  You’ll need to handle that too.
> 
> Yes, this is a problem, but I have absolutely no clue as to how it could be solved. I would naturally want to keep the number of false-positive at minimum, even at the cost of some bugs going undetected.

It’s a real problem, and unless you have a solution it probably makes the checker unusable.

This is essentially an “all paths” problem, and we have a few checkers, such as the IdempotentOperations checker, which try and address this kind of problem.

Roughly speaking, the checker has to be implemented in two stages:

(1) Keep on the side a map from “condition checks” to a tri-state: { always null, may-be-null, no value }.  The “no value” is the default, and it essentially means you have no data for a given condition.

For example, suppose you had:

  if ( pointer value )
  …
  if ( pointer value )

You would have a side-map mapping from each of these IfStmts to the tri-state.

(2) Monitor checks of the pointer value using the analyzer visitor interface.  If the pointer value is null and the current map value not “may-be-null”, mark it “null”.  Otherwise, mark it “may-be-null”, which essentially means that the given condition statement is no longer in contention for the warning.

(3) After the analyzer finishes exploring paths, go over this map and find the entries that are marked “always null”.  Those are the places to emit a warning.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130724/9efe8265/attachment.html>