[PATCH] D37120: [analyzer] Fix modeling arithmetic

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Aug 26 02:15:55 PDT 2017


NoQ added a comment.

Yeah, `LocAsInteger` is supported very poorly in most places. We can only get away with it because people rarely cast pointers to integers.

Your reasoning about how `LocAsInteger` works poorly with eg. multiplication or bitwise operations (consider some sanitizer computing shadow memory) is one of the reasons why i think `LocAsInteger` should be turned into a special symbol class ("`SymbolRegionAddress`"), to be able to participate in symbolic expressions. Even if our constraint manager is unable to reason about such expressions, having correct opaque expressions is better than having `UnknownVal`s because at least we can carry constraints on exact same expressions, and also better than conjured symbols because we are sure to assign the same symbol if such expression is computed more than once. Such change would allow us to remove special handling of `LocAsInteger` in some cases (just treat it as any other symbol), while still performing simplifications where we want. We'd probably be able to collapse addresses of symbolic pointers into `SymbolCast`s, eg. `$addr<element{SymRegion{$x}, 2, char}` into `(uintptr_t)$x + 2`. But all of that is a bigger task, not sure if you want to dig into this much, as it's not necessary to fix the crashes.



================
Comment at: lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:363
         case nonloc::LocAsIntegerKind:
           return evalBinOpLL(state, op, lhsL,
                              rhs.castAs<nonloc::LocAsInteger>().getLoc(),
----------------
alexshap wrote:
> @NoQ , @dcoughlin 
> while we are looking at this code - just to double check - is this line (363) actually correct ?
> Let's take a look at the following example:
>    bool f(long x, double *p1, double *p2) {
>       long y = (long)p1 - (long) p2; 
>       // or,alternatively (long)p1 * (long)p2  or (long)p1 / (long)p2
>       return y == x;
>     }
> it looks like again the analyzer will try to use evalBinOpLL and evaluate this as an operation over pointers, while (if my understanding is correct) we should be working with integers here (and yes, in most cases it should return UnknownVal)
>   
Yeah, even pointer substraction would be handled incorrectly, because pointer types would be incorrectly taken into account (if only we actually had pointer substraction).


================
Comment at: lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:371-373
+            return makeSymExprValNN(
+                state, op, lhs.castAs<nonloc::LocAsInteger>(),
+                rhs.castAs<nonloc::ConcreteInt>(), resultTy);
----------------
alexshap wrote:
> NoQ wrote:
> > For now this code would return `UnknownVal` in most cases (pointer is not tainted, or not symbolic, or contains an offset), and still construct an invalid symbol in the rest of the cases (`makeSymExprValNN` would add the number to the pointer symbol, instead of modelling an offset within the pointed-to region).
> > 
> > Once D35450 finally lands (sorry for the delays...), it'd return `UnknownVal` less often and crash more often.
> @NoQ - many thanks for the code review!
> 
> i assume i might be missing smth, anyway, (replying to this comment and another one below)
> 
>   >This is probably the way to go: just take the Loc behind the LocAsInteger, cast it to char * (because 
>   >pointer type shouldn't affect how much bytes of offset are added, anymore), add your integer to it, 
>   >pack back into LocAsInteger
> 
> i'm not sure i understand how that would work here and somehow don't like it. 
> For example, op can be MultiplicativeOp (i.e. long y = ((long)p) * 13;) extracting the Loc and doing smth with the Loc -  doesn't seem to be the right thing (if i understood your suggestion correctly)
> 
>   >For now this code would return UnknownVal in 
>   >most cases (pointer is not tainted, or not symbolic, 
>   >or contains an offset)
> 
> that was my understanding what this code should do (return UnknownVal in most cases unless we can reason about the actual (integer) values of LHS,RHS)
> 
> 
Yeah, we should still return `UnknownVal` for now in most cases. I just wanted to point out that addition and substraction can be handled this way. And also that `makeSymExprValNN()` seems to be unable to handle `LocAsInteger` arguments correctly, so manual intervention would be necessary anyway. So i think it'd be better to just return `UnknownVal` directly.


Repository:
  rL LLVM

https://reviews.llvm.org/D37120





More information about the cfe-commits mailing list