[cfe-dev] pointer values, taint propogation
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Feb 20 02:45:13 PST 2017
Hello,
While the analyzer tries to cover all sides of the C's "everything is
everything" attitude (pointers are integers, integers are pointers,
pointer types do not matter), it seems that casting integers to pointers
is the least covered part of it.
--= IntegerAsLoc values =--
First, there is currently no accepted way to represent a cast from a
*symbolic* integer to a pointer in the SVal hierarchy. For the opposite
cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal
sub-class to represent the result. However, nobody made a
"loc::IntegerAsLoc" for us yet.
I'd probably suggest to represent those as SymbolicRegion of an
integer-type symbol. In fact, the following patch fixes all of your
cases except sv2/sp2:
```
diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 28b43dd..6e894f3 100644
--- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val,
QualType castTy) {
if (!isLocType)
return makeNonLoc(se, T, castTy);
- return UnknownVal();
+
+ const SymbolicRegion *SR = MemMgr.getSymbolicRegion(se);
+ return loc::MemRegionVal(SR);
}
```
This produces "&SymRegion{conj_$0{long long}}", which was never done
before and needs much more testing. Surprisingly, it doesn't even fail
on tests (though i'd still suspect that it'd introduce a bunch of
crashes on real codebases). I believe this is eventually the way to go,
but i'd not be ready to accept this change right away without proper
evaluation. I'm not sure when i'd be able to do this work.
I also think that this trick is better than representing pointers with
unsigned integers the way you did it, because consistently representing
pointer values with Loc-type SVal's is, in my experience, a very good
contract to follow; there are a lot of useful assertions around that.
--= Why sv3/sp3 works? =--
This case is easier than the others because it doesn't produce any
casts, but works through pointer arithmetic, which is well-supported.
Essentially, "sv3 += ll" is treated similarly to "sv3[ll]", which
produces an ElementRegion inside SymbolicRegion. Because ElementRegion's
index ll = conj_$0<long long> is tainted, the whole SVal is considered
tainted.
--= What's so much wronger with sv2/sp2? =--
The difference is that it involves adding an integral symbol to a 0
(Loc); this operation itself yields UnknownVal. I've no idea why is this
simple operation unsupported (UnknownVal essentially always means
unsupported), but i suspect that it's because we'd need a pointer type
to conduct it correctly (multiply the shift by pointee type size), which
is not stored in 0 (Loc) itself. Adding pointer type here would take
some work.
--= Why so much inconsistency? =--
Because static symbolic execution is so much harder than compilation or
"concrete" interpretation, but on the other hand it doesn't require
absolute correctness, it's natural that some operations will be
simplified out until somebody really needs them to be supported. There
are just so many cases to cover, and a lot of them require extending the
SVal algebra with new operations and new value kinds. So we're bound to
see UnknownVal's from time to time.
Another layer of inconsistency - not sure if it affected you - is that
often UnknownVal's are replaced with SymbolConjured values, which
happens whenever somebody finds out that he badly needs to recover
path-sensitivity in his specific example. There's no consensus if
symbolicating unknowns is generally a good thing.
Once you notice symbolication, you can hack upon it by propagating taint
towards the newly conjured symbol in a checker callback. This would
work, but it'd be harder to propagate taint removal ("the symbol is
already verified, no longer to keep it tainted") back to the original
symbol.
On 2/15/17 3:55 PM, McDowell, Raymond C. via cfe-dev wrote:
>
> I didn’t realize the screen captures would make things difficult.
> Here’s a version of the e-mail with text instead.
>
> I would like to write a checker to make sure that pointers from
> untrusted sources are not dereferenced. So I am playing around with
> the alpha.security.taint checker to try to understand how taint
> propagation works. I put together some simple test cases:
>
> #include <stdio.h>
>
> #include <string.h>
>
> void test(float f, char *sp1, char *sp2, char *sp3) {
>
> char *sv1, *sv2, *sv3;
>
> long long ll;
>
> scanf("%lld", &ll); // ll is read from stdin, so is untrusted
>
> if (ll) {
>
> printf((char *)ll, f); // flag: ll is untrusted
>
> sv1 = (char *)ll;
>
> printf(sv1, f); // flag: sv1 = ll is untrusted
>
> sp1 = (char *)ll;
>
> printf(sp1, f); // flag: sp1 = ll is untrusted
>
> printf((char *)NULL+ll, f); // flag: ll is untrusted
>
> unsigned long u = 0;
>
> u += ll;
>
> printf((char *)u, f); // flag: u = 0+ll is untrusted
>
> sv2 = NULL;
>
> sv2 += ll;
>
> printf(sv2, f); // flag: sv2 = NULL+ll is untrusted
>
> sp2 = NULL;
>
> sp2 += ll;
>
> printf(sp2, f); // flag: sp2 = NULL+ll is untrusted
>
> printf(sp3, f);
>
> sv3 = sp3;
>
> sv3 += ll;
>
> printf(sv3, f); // flag: sv3 = sp3+ll is untrusted
>
> sp3 += ll;
>
> printf(sp3, f); // flag: sp3 = sp3+ll is untrusted
>
> }
>
> }
>
> The indicated calls to printf should be flagged because an untrusted
> value is passed as the format string (the first argument). When I run
> the alpha.security.taint checker on this, it only flags the last two.
> Looking into this, I found that the static analyzer often assigns an
> unknown value to the result of casting from an integer to a pointer.
> Taint can’t be attached to an unknown value, so I need a known value
> for the pointer. Although I’m not sure if this is really what we want
> to do, for now I added a post-check call-back on cast expressions to
> my checker that casts the integer to unsigned and then assigns the
> result as the pointer value.
>
> void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE,
> CheckerContext &ChCtx) const {
>
> ProgramStateRef St = ChCtx.getState();
>
> const LocationContext *LCtx = ChCtx.getLocationContext();
>
> if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE,
> LCtx).isUnknown()) {
>
> // if cast to pointer resulted in an unknown value, try casting to
> unsigned instead
>
> const Expr *SubExpr = CE->getSubExpr();
>
> QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;
>
> QualType SubExprTy = SubExpr->getType();
>
> SVal Val = St->getSVal(SubExpr, LCtx);
>
> Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val,
> CastTy, SubExprTy);
>
> ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));
>
> }
>
> }
>
> When I run alpha.security.taint together with this on the test file,
> it flags the calls to printf where (char *)l and (char *)u are passed,
> but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are
> passed, even though these are just different expressions with the same
> value. I tried looking into this using the debugger, and found that
> when the analyzer calls ExprEngine::evalStore for the assignment
> statement (say “sv1 = (char *)ll”), it gives the pointer the
> appropriate value, but when the analyzer calls
> GenericTaintChecker::checkUncontrolledFormatString from the pre-check
> call-back for the call to printf on the following line of the test
> file, the pointer has an UnknownVal. (See excepts from the debugging
> session below.)
>
> Can anyone help me understand what is going on here?
>
> Thanks,
>
> Ray
>
> (These debugger session excepts are inherently screen captures, but
> the gist of them is described above.)
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list