[cfe-dev] pointer values, taint propogation

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Mon Feb 20 02:45:13 PST 2017


Hello,

While the analyzer tries to cover all sides of the C's "everything is 
everything" attitude (pointers are integers, integers are pointers, 
pointer types do not matter), it seems that casting integers to pointers 
is the least covered part of it.


--= IntegerAsLoc values =--

First, there is currently no accepted way to represent a cast from a 
*symbolic* integer to a pointer in the SVal hierarchy. For the opposite 
cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal 
sub-class to represent the result. However, nobody made a 
"loc::IntegerAsLoc" for us yet.

I'd probably suggest to represent those as SymbolicRegion of an 
integer-type symbol. In fact, the following patch fixes all of your 
cases except sv2/sp2:

```
diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp 
b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 28b43dd..6e894f3 100644
--- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val, 
QualType castTy) {

      if (!isLocType)
        return makeNonLoc(se, T, castTy);
-    return UnknownVal();
+
+    const SymbolicRegion *SR = MemMgr.getSymbolicRegion(se);
+    return loc::MemRegionVal(SR);
    }
```

This produces "&SymRegion{conj_$0{long long}}", which was never done 
before and needs much more testing. Surprisingly, it doesn't even fail 
on tests (though i'd still suspect that it'd introduce a bunch of 
crashes on real codebases). I believe this is eventually the way to go, 
but i'd not be ready to accept this change right away without proper 
evaluation. I'm not sure when i'd be able to do this work.

I also think that this trick is better than representing pointers with 
unsigned integers the way you did it, because consistently representing 
pointer values with Loc-type SVal's is, in my experience, a very good 
contract to follow; there are a lot of useful assertions around that.


--= Why sv3/sp3 works? =--

This case is easier than the others because it doesn't produce any 
casts, but works through pointer arithmetic, which is well-supported. 
Essentially, "sv3 += ll" is treated similarly to "sv3[ll]", which 
produces an ElementRegion inside SymbolicRegion. Because ElementRegion's 
index ll = conj_$0<long long> is tainted, the whole SVal is considered 
tainted.


--= What's so much wronger with sv2/sp2? =--

The difference is that it involves adding an integral symbol to a 0 
(Loc); this operation itself yields UnknownVal. I've no idea why is this 
simple operation unsupported (UnknownVal essentially always means 
unsupported), but i suspect that it's because we'd need a pointer type 
to conduct it correctly (multiply the shift by pointee type size), which 
is not stored in 0 (Loc) itself. Adding pointer type here would take 
some work.


--= Why so much inconsistency? =--

Because static symbolic execution is so much harder than compilation or 
"concrete" interpretation, but on the other hand it doesn't require 
absolute correctness, it's natural that some operations will be 
simplified out until somebody really needs them to be supported. There 
are just so many cases to cover, and a lot of them require extending the 
SVal algebra with new operations and new value kinds. So we're bound to 
see UnknownVal's from time to time.

Another layer of inconsistency - not sure if it affected you - is that 
often UnknownVal's are replaced with SymbolConjured values, which 
happens whenever somebody finds out that he badly needs to recover 
path-sensitivity in his specific example. There's no consensus if 
symbolicating unknowns is generally a good thing.

Once you notice symbolication, you can hack upon it by propagating taint 
towards the newly conjured symbol in a checker callback. This would 
work, but it'd be harder to propagate taint removal ("the symbol is 
already verified, no longer to keep it tainted") back to the original 
symbol.


On 2/15/17 3:55 PM, McDowell, Raymond C. via cfe-dev wrote:
>
> I didn’t realize the screen captures would make things difficult. 
> Here’s a version of the e-mail with text instead.
>
> I would like to write a checker to make sure that pointers from 
> untrusted sources are not dereferenced.  So I am playing around with 
> the alpha.security.taint checker to try to understand how taint 
> propagation works.  I put together some simple test cases:
>
> #include <stdio.h>
>
> #include <string.h>
>
> void test(float f, char *sp1, char *sp2, char *sp3) {
>
> char *sv1, *sv2, *sv3;
>
> long long ll;
>
> scanf("%lld", &ll);    // ll is read from stdin, so is untrusted
>
> if (ll) {
>
> printf((char *)ll, f);    // flag: ll is untrusted
>
> sv1 = (char *)ll;
>
> printf(sv1, f);    // flag: sv1 = ll is untrusted
>
> sp1 = (char *)ll;
>
> printf(sp1, f);    // flag: sp1 = ll is untrusted
>
> printf((char *)NULL+ll, f);    // flag: ll is untrusted
>
> unsigned long u = 0;
>
> u += ll;
>
> printf((char *)u, f);    // flag: u = 0+ll is untrusted
>
> sv2 = NULL;
>
> sv2 += ll;
>
> printf(sv2, f);    // flag: sv2 = NULL+ll is untrusted
>
> sp2 = NULL;
>
> sp2 += ll;
>
> printf(sp2, f);    // flag: sp2 = NULL+ll is untrusted
>
> printf(sp3, f);
>
> sv3 = sp3;
>
> sv3 += ll;
>
> printf(sv3, f);    // flag: sv3 = sp3+ll is untrusted
>
> sp3 += ll;
>
> printf(sp3, f);    // flag: sp3 = sp3+ll is untrusted
>
> }
>
> }
>
> The indicated calls to printf should be flagged because an untrusted 
> value is passed as the format string (the first argument).  When I run 
> the alpha.security.taint checker on this, it only flags the last two.  
> Looking into this, I found that the static analyzer often assigns an 
> unknown value to the result of casting from an integer to a pointer.  
> Taint can’t be attached to an unknown value, so I need a known value 
> for the pointer. Although I’m not sure if this is really what we want 
> to do, for now I added a post-check call-back on cast expressions to 
> my checker that casts the integer to unsigned and then assigns the 
> result as the pointer value.
>
> void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE, 
> CheckerContext &ChCtx) const {
>
> ProgramStateRef St = ChCtx.getState();
>
> const LocationContext *LCtx = ChCtx.getLocationContext();
>
> if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE, 
> LCtx).isUnknown()) {
>
> // if cast to pointer resulted in an unknown value, try casting to 
> unsigned instead
>
> const Expr *SubExpr = CE->getSubExpr();
>
> QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;
>
> QualType SubExprTy = SubExpr->getType();
>
> SVal Val = St->getSVal(SubExpr, LCtx);
>
> Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val, 
> CastTy, SubExprTy);
>
> ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));
>
> }
>
> }
>
> When I run alpha.security.taint together with this on the test file, 
> it flags the calls to printf where (char *)l and (char *)u are passed, 
> but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are 
> passed, even though these are just different expressions with the same 
> value.  I tried looking into this using the debugger, and found that 
> when the analyzer calls ExprEngine::evalStore for the assignment 
> statement (say “sv1 = (char *)ll”), it gives the pointer the 
> appropriate value, but when the analyzer calls 
> GenericTaintChecker::checkUncontrolledFormatString from the pre-check 
> call-back for the call to printf on the following line of the test 
> file, the pointer has an UnknownVal.  (See excepts from the debugging 
> session below.)
>
> Can anyone help me understand what is going on here?
>
> Thanks,
>
> Ray
>
> (These debugger session excepts are inherently screen captures, but 
> the gist of them is described above.)
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list