[cfe-dev] [analyzer] Looking up previously stored custom state for ParmVarDecls that have gone out of scope

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Wed May 2 16:35:48 PDT 2018



On 5/1/18 2:09 PM, Timothy J. Wood via cfe-dev wrote:
> While working on my outError checker, I'm running into a bunch of false positives that seem to boil down to the same issue.
>
> I *think* my core question is "how do I store state about symbols going out of scope such that I can reliably look it up later", but I'm not 100% sure. Longer version:
>
> Attached below is a diff with a minified version of my checker that just shows this issue, which can be applied to clang on the `release_60` branch. The test case is:
>
> I've annotated a test run of the input and output to make it easier to match up locations, when running like so, :
>
>      % ./bin/clang -cc1 -internal-isystem ./lib/clang/6.0.1/include -nostdsysteminc -analyze -analyzer-checker=debug.ViewExplodedGraph,osx.cocoa.NSErrorWrite -fblocks -analyzer-store=region -Wno-objc-root-class ../llvm/tools/clang/test/Analysis/missing-error.m
>
>
> Test input:
>
> [B]	static BOOL _fillErrorWithReturn(NSError **_outError)
> 	{
> 		if (_outError) {
> 			*_outError = [NSError make];
> 		}
> [C]		return NO;
> 	}
>
> [A]	- (BOOL)fillErrorHelperFunctionWithReturn:(NSError **)outError;
> 	{
> [D]		return _fillErrorWithReturn(outError);
> 	}
>
>
> Log file:
>
>
> [A]	checkBeginFunction [0x7ff6038a13f0]
> 	  Decl 0x7ff60386e130
> 	  at /Users/bungi/Source/LLVM/llvm/tools/clang/test/Analysis/missing-error.m:26:1
> 	  OutError ParmVarDecl = 0x7ff60386e1b8
> 	  StackSlotVal &outError
> 	  OutErrorVal &SymRegion{reg_$1<NSError ** outError>}
> [B]	checkBeginFunction [0x7ff6038a1ea8]
> 	  Decl 0x7ff60386e020
> 	  at /Users/bungi/Source/LLVM/llvm/tools/clang/test/Analysis/missing-error.m:18:13
> 	  OutError ParmVarDecl = 0x7ff60386df20
> 	  StackSlotVal &_outError
> 	  OutErrorVal &SymRegion{reg_$1<NSError ** outError>}
> [C]	checkPreStmt<ReturnStmt> [0x7ff604800b48]
> 	  Stmt 0x7ff603898078
> 	  at /Users/bungi/Source/LLVM/llvm/tools/clang/test/Analysis/missing-error.m:23:2
> 	  OutError ParmVarDecl = 0x7ff60386df20
> 	  StackSlotVal &_outError
> 	  OutErrorVal &SymRegion{reg_$2<NSError ** _outError>}
> 	  OutErrorVal is not constrained
> [D]	checkPreStmt<ReturnStmt> [0x7ff604801188]
> 	  Stmt 0x7ff60386e438
> 	  at /Users/bungi/Source/LLVM/llvm/tools/clang/test/Analysis/missing-error.m:28:2
> 	  OutError ParmVarDecl = 0x7ff60386e1b8
> 	  StackSlotVal &outError
> 	  OutErrorVal &SymRegion{reg_$1<NSError ** outError>}
> 	  OutErrorVal is not constrained
> [C]	checkPreStmt<ReturnStmt> [0x7ff604802c18]
> 	  Stmt 0x7ff603898078
> 	  at /Users/bungi/Source/LLVM/llvm/tools/clang/test/Analysis/missing-error.m:23:2
> 	  OutError ParmVarDecl = 0x7ff60386df20
> 	  StackSlotVal &_outError
> 	  OutErrorVal &SymRegion{reg_$2<NSError ** _outError>}
> 	  OutErrorVal is not constrained
> [D]	checkPreStmt<ReturnStmt> [0x7ff604802fd0]
> 	  Stmt 0x7ff60386e438
> 	  at /Users/bungi/Source/LLVM/llvm/tools/clang/test/Analysis/missing-error.m:28:2
> 	  OutError ParmVarDecl = 0x7ff60386e1b8
> 	  StackSlotVal &outError
> 	  OutErrorVal &SymRegion{reg_$1<NSError ** outError>}
> 	  OutErrorVal is not constrained
>
>
> At [B], the beginning of the static function that gets inlined, looking up the _outError parameter has a stack slot value of "StackSlotVal &_outError", and when I get the contents of that location, I get "&SymRegion{reg_$1<NSError ** outError>}". Note the change in name -- this is the variable from the caller, inlined in to the static function. That seems OK.

Yep, that's indeed OK. The underscored variable _outError contains a 
value that is equal to the original value of non-underscored outError. 
There never was a separate symbol for representing the value of 
_outError - the analyzer just knows that its value is represented by the 
old symbol whenever it matters.

> Later, in the two [C] branches, when about to return from the inlined static function, trying to get the SVal for _outError, I get a new “$2” name, "&SymRegion{reg_$2<NSError ** _outError>}”, and even though there should be two branches where _outError should be constrained alternately to null and not-null, the OutErrorVal returned is marked as non-constrained on both paths.

Yep, that's because the underscored _outError is dead as a variable, 
i.e. no longer used in the program, so we can forget about it. The 
checker is not prevented from reincarnating the variable (because 
currently checkers are omnipotent), but that's not the right thing to 
do. This isn't "renaming" - the underscored reg_$2 should have never 
been created and is essentially created artificially by the checker due 
to API misuse that we can't easily catch (though we're working on it).

Now note that even though _outError is dead, *symbol* 
&SymRegion{reg_$1<NSError ** outError>} is alive, because it is still 
stored in another variable, namely the original non-underscored 
outError. Therefore the constraint on the symbol is also preserved.

Additionally, argument values are removed from the Store (memory model) 
because parameter variables become dead, but they are still present in 
the Environment (syntactic expression values) because the 
call-expression is not yet fully evaluated, and therefore its 
sub-expressions that represent argument values are still considered to 
be important to keep.

With that reasoning in mind, it is generally advised to use the 
Environment to obtain argument values, rather than the Store. The 
convenient way of doing that is by using the CallEvent API. A CallEvent 
object is readily available in checkPreCall and checkPostCall callbacks. 
If it's not available, you can easily construct it via 
CallEventManager::getCaller(). For obtaining the argument values you 
don't necessarily need to pass an old ProgramStateRef (that was actual 
when we just entered the call) to getCaller(). The current 
ProgramStateRef will be just fine, because all we need from it is 
expression values, and expression values don't mutate.

What you won't be able to do with the current program state is obtain 
the old value of **_outError. *_outError is in the Environment, but 
**_outError is in the old Store only.

We've been thinking about adding a way of retrieving an old ProgramState 
that corresponds to the beginning of previous stack frames. This would 
have helped and we wanted it for similar reasons.

---

I guess that was too much lecturing, i should focus on the task at hand.

Am I understanding correctly that you are trying to compare **_outError 
at the beginning of the function and at the end of the function and then 
emit the warning based on the return value?

I think you should do a:

   REGISTER_MAP_WITH_PROGRAMSTATE(const StackFrameContext *, SVal)

And put the original SVal for **_outError into it. Then at the end of 
the function see if the value has changed, by retrieving *_outError via

   CheckerContext.getStateManager().getCallEventManager()
   .getCaller(CheckerContext.getCurrentStackFrame, 
CheckerContext.getState())
   .getArgSVal(error argument number)

and dereferencing it.


> I'm guessing that both the change in the SVal name, and the lack of constraint are due to _outError going out of scope. Making things more odd, though, is that in the exploded graph dot file (attached below too) if you search for the state "0x7ff604800b48" (from the first [C] / checkPreStmt<ReturnStmt> above), you can see that it still has a constraint "reg_$1<NSError ** outError> : { [0, 0] }". In my real checker, I implement the checkDeadSymbols callback and make notes about whether an outError ParmVarDecl value was definitely null when it went out of scope, or whether it was definitely non-null, but had a valid value written to *outError at the time.
>
> The problem lies how to store this information in my custom state such that checkPreStmt<ReturnStmt> can reliably look it up again. If I use a MemRegion as a key in my REGISTER_MAP_WITH_PROGRAMSTATE, then the lookup at "[C]  checkPreStmt<ReturnStmt>" seems to fail due to the name change. What should I be using as the lookup key in order to deal with sort lookup of out-of-scope symbols coming back with a different name?
>
> I’ve only noticed this sort of renaming happening with inlining, but maybe that’s is just coincidental.
>
> Hopefully this is the right question, but if you see other places I've gone off the rails, let me know. Otherwise, I'll keep reading and see if I can figure out a solution.
>
> Thanks!
>
> -tim
>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180502/7bd837f9/attachment.html>


More information about the cfe-dev mailing list