[cfe-dev] [analyzer] Random SVal hierarchy questions.

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Tue Nov 10 00:16:30 PST 2015


Hi Devin!

This is very useful information. For example knowing when it is safe to
rely on aliasing information provided by the analyzer can be useful for the
checker writers as well. Do you plan to extend the analyzer documentation
in the repositry?

Thanks,
Gabor

On 10 November 2015 at 02:46, Devin Coughlin via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi Artem,
>
> Sorry it took me so long to get back to you.
>
> > On Nov 2, 2015, at 7:06 AM, Artem Dergachev via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> >
> > I've got a few quick random questions about SVal/MemRegion/SymExpr
> heirarchy in the Static Analyzer. In most cases, i'm looking for design
> ideas behind particular values, so it's sort of theoretical.
> >
> >
> > == SymbolConjured of a record type ==
> >
> > Usually symbols have integral types. However, if a function returns,
> say, "struct S" by value, and its definition is not available, then its
> return value is described as conj_$N{S}. I'm surprised to see a symbol of
> such type, because it doesn't behave like other symbols - eg. you cannot
> compose a SymIntExpr from it, or assume range constraints on it. Is this
> behavior intentional, or it's rather worth it try replacing it with a more
> structure-like value? For example, a LazyCompoundValue with a NULL Store
> could describe an unknown structure pretty well. Is it worth it to *assert*
> that all symbols have numeric or pointer types only?
>
> One thing conjured symbols of record type let you do is reason about
> equality of MemRegions and of values in the fields of those regions
> independently of range constraints. So, for example, conjuring a symbol for
> the result of getS() below let’s the analyzer establish that x1 == x2:
>
> struct S {
>   int f;
> };
>
> struct S getS();
>
> void foo() {
>   struct S s = getS();
>
>   int x1 = s.f;
>   int x2 = s.f;
>
>   // Store Bindings at this program point:
>   //   (s,0,default) : conj_$2{struct S}
>   //   (x1,0,direct) : derived_$3{conj_$2{struct S},s->f}
>   //   (x2,0,direct) : derived_$3{conj_$2{struct S},s->f}
>
>   clang_analyzer_eval(x1 == x2); // Yields TRUE
> }
>
> > == SymbolicRegion in different memory spaces ==
> >
> > If two SubRegions have different super regions, in particular if they
> have different memory spaces, they are considered to be different (i.e.,
> have different hash). This makes sense for, say, discriminating between
> VarRegions of ParmVarDecls in different stack frame contexts (different
> calls to the same function). However, for SymbolicRegion based on a pointer
> symbol, it does not make sense to discriminate between the symbolic region
> in Unknown memory space and the symbolic region in Heap memory space (eg.
> constructed by MallocChecker). The analyzer currently considers these
> regions to be two different regions, even if they are based on the same
> pointer. Is this problem worth fixing by removing the memory space from the
> hash function of SymbolicRegion?
>
> Regions in the unknown space can alias with any space, not just the heap
> space. In general, the analyzer makes fairly optimistic assumptions about
> dis-aliasing. Consider the following code:
>
> int *bar(int *p);
>
> void foo(int param) {
>   int *ptr = bar(&param);
>   *ptr = 8;
>   param = 9;
>
>   // Store Bindings at this program point:
>   //   (param,0,direct) : 9 S32b
>   //  (ptr,0,direct) : &SymRegion{conj_$3{int *}}
>   //  (SymRegion{conj_$3{int *}},0,direct) : 8 S32b
>
>   clang_analyzer_eval(*ptr == 9); // Yields FALSE
> }
>
> Here, the analyzer has SymRegion{conj_$3{int *}} is in the unknown memory
> space but it is entirely possible that bar() returns its parameter. In this
> case, updating *ptr would change the value stored for param. The analyzer
> (unsoundly) assumes this can’t happen. It could be more conservative and
> case split on all possible aliasing relationships, but this would be very
> expensive! Alternatively, it could invalidate all other storage on writes
> to the unknown space — but this would be quite imprecise, invalidating
> storage for locations that probably aren’t aliased. The assumption that
> symbolic regions in the unknown space don’t alias parameters isn’t sound,
> but it does cover the common case.
>
> Similarly, the analyzer makes optimistic dis-aliasing assumptions about
> regions with two different conjured symbols:
>
> int *bar();
>
> void foo() {
>   int *ptrUnknown1 = bar();
>   int *ptrUnknown2 = bar();
>   *ptrUnknown1 = 8;
>   *ptrUnknown2 = 9;
>
> // Store Bindings at this program point:
> //   (ptrUnknown1,0,direct) : &SymRegion{conj_$2{int *}}
> //   (SymRegion{conj_$2{int *}},0,direct) : 8 S32b
> //   (ptrUnknown2,0,direct) : &SymRegion{conj_$5{int *}}
> //   (SymRegion{conj_$5{int *}},0,direct) : 9 S32b
> }
>
> (This optimistic assumption of one of many things that make the “general”
> (precondition of true) approach to summaries quite challenging. The
> analyzer could generate a summary making the optimistic assumption that two
> parameter pointers are dis-aliased only to discover when trying to apply
> the summary that the two pointers point to the same memory!)
>
> One thing to note is that the analyzer’s assumption of dis-aliasing is
> sound for regions within the heap space. This is because it only produces
> symbolic regions in that space when it actually observes the allocation
> site (e.g., malloc) of the region. For example, if the two calls to bar()
> above were malloc() instead, then the analyzer would produce regions in the
> heap space and could safely assume that the two pointers point to distinct
> memory.
>
> In contrast with the optimistic approach the analyzer takes for aliasing
> between symbolic regions, the analyzer is more pessimistic about symbolic
> *offsets* into regions. As Jordan describes in
> clang/docs/analyzer/RegionStore.txt (Binding Invalidation), binding to a
> region with a symbolic offset invalidates the entire region. For example:
>
> void foo(int i, int j) {
>   int *ptr = (int *)malloc(2*sizeof(int));
>
>   ptr[i] = 8;
>   ptr[j] = 9;
>
> // Store Bindings at this program point:
> //   (ptr,0,direct) : &element{SymRegion{conj_$2{void *}},0 S32b,int}
> //   (SymRegion{conj_$2{void *}},0,default) : Unknown
> //   (element{SymRegion{conj_$2{void *}},reg_$4<j>,int},direct) : 9 S32b
>
> For symbolic offsets, the analyzer assumes that p[i] and p[j] may alias
> and so invalidates p[i] when p[j] is bound. This is potentially imprecise
> (it may be the case that i != j), but as a practical heuristic it is much
> more likely that two symbolic offsets within the same region will alias
> than that two symbolic regions themselves will alias.
>
> > == CXXTempObjectRegion and block counts ==
> >
> > CXXTempObjectRegion is constructed with expression that caused it to
> appear, a memory space, and that's it. I wonder if, during lifetime of a
> CXXTempObjectRegion we can accidentally reach the same expression again and
> produce the same region twice and, say, overwrite its bindings. In other
> words, does CXXTempObjectRegion require some sort of block-count, like
> SymbolConjured? This time, i'd rather believe that it doesn't, and that the
> current behavior is correct, so i wonder if there's an easy way to explain
> why is it correct.
>
> I discussed this with Jordan and we’re pretty sure this isn’t an issue
> (although it is a good catch!). Our reasoning is that the only legal way to
> execute the same expression twice while the result of the previous one is
> still around is recursion — and you get different CXXTempObjectRegions for
> different stack frames (see MemRegionManager::getCXXTempObjectRegion()). In
> all other cases, any const reference that extends the lifetime of the
> temporary would have to have ended its lifetime as well before the
> expression could be executed again. That said, there are also sorts of
> gross things the programmer could do with undefined behavior to get at the
> same TempObjectRegion (e.g., life-time extending a temporary in a loop and
> storing the the address of the const reference to it in a pointer). We
> could potentially add a block count here to provide less unexpected
> analyzer behavior in these cases.
>
> > == SymbolRegionValue after a deleted binding ==
> >
> > As far as i understand, SymbolRegionValue can be described as a symbol
> that denotes the unknown value of a region *in the beginning* of the
> analysis. It is produced by the RegionStore whenever there is no other
> binding that has overwritten this value *during* the analysis. However,
> when we do invalidation, we can sometimes remove bindings from regions. Is
> it intended that the same SymbolRegionValue may possibly re-appear after
> invalidation, simply because the binding was removed, even if the original
> SymbolRegionValue is considered dead, and thus all constraints and checker
> data was erased, and hence there's no harm in reusing the symbol? Or we'd
> better try our best to ensure that some sort of SymbolDerived would always
> be created in this scenario, and the original SymbolRegionValue never gets
> resurrected?
>
> When invalidating regions with symbolic base regions, the analyzer tries
> to adds a default binding to a fresh conjured symbols (see void
> invalidateRegionsWorker::VisitCluster(const MemRegion *, const
> ClusterBindings *)).
> For example:
>
> struct S {
>   int f;
> };
>
> void invalidate(struct S *s);
>
> void foo(struct S *s) {
>   s->f = 10;
> // Store Bindings at this program point:
> //   (SymRegion{reg_$0<s>},0,direct) : 10 S32b
>
>   invalidate(s);
>
> // Store Bindings at this program point:
> //   (SymRegion{reg_$0<s>},0,default) : conj_$1{int}
> }
>
> When invalidating ’s', the analyzer removes the direct binding storing the
> concrete value ’10’ at off set 0 and adds a default binding storing the
> fresh conjured value conj_$1{int}.
>
> Are you seeing cases where the analyzer is removing bindings but not
> adding a default binding to a conjured symbol?
>
> Devin
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151110/d47ae42c/attachment.html>


More information about the cfe-dev mailing list