[cfe-dev] [analyzer] Random SVal hierarchy questions.

Tue Nov 10 08:50:16 PST 2015

Hello Devin,

Thanks a lot for the reply! Questions about CXXTempObjectRegion block 
counts and SymbolRegionValue lifetime are now perfectly clear. I have a 
few more confusions to sort out with other questions.

 > the only legal way to execute the same expression twice while the 
result of the previous one is still around is recursion

This argument makes sense to me :) Probably some day i'd have a look if 
MemRegionManager::getCXXStaticTempObjectRegion() can have similar issues.

 > Are you seeing cases where the analyzer is removing bindings but not 
adding a default binding to a conjured symbol?

Not really, i just wanted to ensure that it's intended to stay that way 
:) I was a bit confused with the phabricator comment 
http://reviews.llvm.org/D5104#70661 .

 > > == SymbolConjured of a record type ==

 > One thing conjured symbols of record type let you do is reason about 
equality of MemRegions and of values in the fields of those regions 
independently of range constraints.

This confirms my understanding, and, additionally, i realized that the 
idea of a LazyCompoundValue with NULL store is completely incorrect :) 
So i can correct this question a bit:

Suppose that instead of the SymbolConjured of type "struct S", we return 
a LazyCompoundValue with a store that contains only one binding - "(R, 
0, default): conj_$N<int>", where R is... dunno, some temporary region 
that represents this structure, or probably the whole memory space, 
doesn't really matter. Then we obtain a similar SymbolDerived from this 
"conjured store", which allows reasoning about equality of its field 
values equally well, however the behavior of the engine confirms to the 
following additional invariants, which i find convenient:

(1) All SymExpr's have numeric or pointer types; any SymExpr can serve 
as a constraint manager key.
(2) Functions that return structs by values always return compound 
values (or unknown/undefined), regardless of how conservatively they 
were evaluated.

I initially run into this problem when coding alpha-renaming, which 
caused additional special case of renaming a compound value into a 
symbolic expression and vice versa. So i'm wondering about the design 
idea behind using symbolic expressions instead of compound values in 
this case. Even though in the current code, it causes no problems. So 
i'm simply wondering if such change would be accepted as a part of some 
bigger patch (or i work around), or if it accidentally rings any bells 
that invariants (1) and (2) were actually always desired and missed only 
accidentally. I think i can prepare a fix if it is desired, which 
asserts invariants (1) and (2) and adjusts the code accordingly.

 > > == SymbolicRegion in different memory spaces ==

Thanks, your explanation had quite some revalations for me regarding the 
meaning of SymbolicRegion memory spaces for invalidation! What makes me 
still confused a bit is whether it is intended to support situations like:

   Store:
     (SymRegion{s},0,direct) : 1 S32b
     (SymRegion{s},0,direct) : 2 S32b

where the first SymRegion is in Heap space, and the second SymRegion is 
in Unknown space? I guess it shouldn't happen, because these two 
bindings refer to the exact same memory address (s + 0). But i can 
easily imagine an incorrectly coded checker (when the author wasn't 
aware of this problem) accidentally cause such store to appear.

So it makes sense to me to try to keep SymbolicRegions consistent: one 
(symbolic) memory address refers to only one region.

I think i can come up with a different solution from what i mentioned 
originally: create a new kind of MemRegion for regions returned by 
malloc()/new, some sort of HeapSegmentRegion. I'm not really seeing any 
benefit of keeping the address symbol at all in such region (similarly 
to how we do not keep address symbols for VarRegion - it's a "concrete" 
region, with all possible information clearly available). Instead, we 
can keep SVal for extent, and override getExtent() accordingly - this 
approach is more convenient for the users than constraints on 
SymbolExtent, because checker authors do not need to perform extra 
actions to see if SymbolExtent folds into a constant; and also symbolic 
malloc arguments would be handled better. Again, i can prepare a patch 
if such fix is desired - i think i even know how to write a test :)

Hmm. We can also use alpha-renaming on SymbolExtent, to replace it with 
the concrete SVal in MallocChecker, when we have alpha-renaming, that it.