[cfe-dev] Memory regions as conservative alias analysis

Tue May 29 16:24:23 PDT 2012

Hi Guillem,

Comments inline!

On May 25, 2012, at 6:50 AM, Guillem Marpons <gmarpons at babel.ls.fi.upm.es> wrote:

> Hi,
> 
> I'm evaluating the possibility of using the memory regions computed by
> the static analyzer as a *conservative* alias analysis (i.e.: I want
> to be able to guarantee that two given pointers are never going to
> alias, in any path). I'm aware that the analyzer is mainly oriented
> towards bug finding, and that false negatives are acceptable in that
> particular application. On the contrary, I think that a conservative
> approach is necessary in many cases for the application I'm thinking
> of (coding rule validation). If the analysis is precise enough, I
> think it can be also useful in this other way. I'd like to know if you
> think this is not feasible for some reason.
> 
> In order to define my checks, I need to know what guarantees the
> analyzer can give me, e.g:
> - If two pointers point to regions contained in different memory
> spaces, they are not aliased. (Memory spaces are always disjoint,
> right?)
> - If two pointers point to different stack frames, they are not aliased.
> - etc.
> Are those guesses correct? What other certainties can I have?

The second assumption is correct.  The first one is not.

If the memory space is "UnknownSpaceRegion", it means that we do not know the actual memory space.  For example, suppose you had code that looked like:

  void *p;
  void *q = foo(p);

If we do not know anything about the implementation of 'foo', then we return a memory region wrapping a conjured symbol.  That memory region has an unknown memory space.  The name pretty much implies the ramifications.  It is entirely possible that 'p' and 'q' are aliases after these two lines of code execute.

> 
> I've been trying to write some toy checkers, looking at the examples
> that come with clang, but the number of concepts/classes involved is
> quite intimidating. I'd really appreciate some document explaining the
> general design of the analyzer and the different pieces that can be
> replaced (StoreManager, ConstraintManager, etc.). I've already read
> the paper "A Memory Model for Static Analysis of C Programs", by Xu,
> Kremenek, and Zhang, and
> http://clang.llvm.org/docs/AnalyzerRegions.html, but I find difficult
> to jump from those documents to the API documentation.

The details are unfortunately complex, and real documentation needs to be written.  I've posted  details numerous times in the past to this list, but I don't have good references off hand.  The basic concepts are:

(1) Symbols, managed by SymbolManger.  The represent how we name symbolic things, be it symbolic values, symbolic chunks of memory, etc.
(2) Memory regions, which represent a lazy abstraction of memory.  Symbolic regions are used to represent abstract chunks of memory.
(3) Bindings of values to regions, which are represented by the symbolic store and are managed by StoreManager.  It's basically up to the StoreManager to decide how aliasing relationships come into play when modeling loads/stores from/to symbolic memory.
(4) Constraints on symbols, managed by Constraint Manager.  Currently this is limited to constraints between symbols and integer constants, but this will eventually also track symbol to symbol relationships as well.

> 
> As a side note: I wrote some example checkers to determine the memory
> space of some variables, only to receive an "<Unknown Region>" for
> almost every region in my code. The cause was not that the analyzer
> was incapable of assigning meaningful memory spaces, but that most of
> 'MemSpaceRegion' subclasses don't re-implement the 'dumpToSream'
> method. It'd be a trivial patch to implement those methods but that
> would greatly improve debugging (I can contribute the patch, if you
> want).

That would be great!

> On the other hand, I never receive regions in 'HeapSpaceRegion'
> for neither 'new' nor 'malloc'. Is the implementation for that
> missing?

HeapSpaceRegion was added during prototyping, and was never used.  We should probably remove it, and replace it with something more expressive, such as a symbolic memory space.  That way we can distinguish between different parts of the heap, e.g., memory returned by distinct allocators.  That may help with providing stronger aliasing guarantees.