[cfe-dev] pointer values, taint propogation
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Mar 6 06:36:25 PST 2017
Hello,
Sorry, it took me some time to get to reproducing this issue.
I think everything works correctly, because left-hand side of operator
+= (unlike operator +) is actually an lvalue - the variable sv2 (sp2)
itself, not its contents. The relevant SVal is .dump()'ed as &sv2, and
"means" a pointer to the variable sv2. Null pointer usually doesn't
point to stack variables, so the null-state is modeled to be impossible.
To get the rvalue of the left-hand side expression, you'd need to
perform an extra getSVal(), this time from the Store - which essentially
dereferences the pointer. Something like this:
```
if (LHS->isLValue()) // or just check the opcode
if (auto ValLAsLoc = ValL.getAs<Loc>()) // auto resolves to
Optional<Loc>
ValL = St->getSVal(*ValLAsLoc);
else
// Handle what went wrong. Perhaps an UndefinedVal or an UnknownVal.
```
Generally, it's a very good idea to .dump() the SVal's you encounter,
there's a lot of useful info. There's also this SValExplainer class
which may help with understanding the SVal hierarchy.
On 2/23/17 11:10 PM, McDowell, Raymond C. wrote:
> Artem,
>
> Thank you for your detailed answer -- it was illuminating and very helpful. Representing a pointer as an unsigned integer was causing problems, but creating a new symbolic region seems to be working. I am now wondering if we can do something similar for the sv2/sp2 cases: when we add a symbolic integer value to a null pointer, make the value a new symbolic region based on that integer value. I tried to write a pre-check/post-check pair to do this, but am running into trouble even with the pre-check to catch this situation. Here's what I wrote:
>
> void UntrustedDerefChecker::checkPreStmt(const BinaryOperator *Bop, CheckerContext &ChCtx) const {
> ProgramStateRef St = ChCtx.getState();
> const LocationContext *LCtx = ChCtx.getLocationContext();
> BinaryOperator::Opcode Op = Bop->getOpcode();
> if (Op != BO_Add && Op != BO_AddAssign
> && Op != BO_Sub && p != BO_SubAssign)
> return;
> Expr *LHS = Bop->getLHS();
> if (LHS->getType()->isPointerType()) {
> SVal ValL = St->getSVal(LHS, LCtx);
> ProgramStateRef LIsNullSt, LIsNotNullSt;
> std::tie(LIsNotNullSt, LIsNullSt) = St->assume(ValL.castAs<DefinedOrUnknownSVal>());
> if (LIsNullSt & !LIsNotNullSt) {
> ChCtx.addTransition(LIsNullSt->set<NullPtrArith>((unsigned)LeftArg));
> return;
> }
> }
> if (Po != BO_Add && Op != BO_Sub)
> return;
> ... // handle pointer on right symmetrically to pointer on the left case above
> }
>
> But when this call-back is called for the sv2 += ll statement, LIsNuLLSt is null and LIsNotNullSt is a valid state, so it thinks sv2 must be non-null even though the state has it constrained to be null. I traced it in the debugger and it looks like the issue is in SimpleSValBuilder::evalCastFromLoc, where the non-symbolic case falls through to the GotoLabel case, making any ConcreteInt-as-Loc value (even NULL) evaluate to true.
>
> Is this a bug, or am I misunderstanding what is going on? Any suggestions for a work-around?
>
> Thanks,
> Ray
>
> -----Original Message-----
> From: Artem Dergachev [mailto:noqnoqneo at gmail.com]
> Sent: Monday, February 20, 2017 5:45 AM
> To: McDowell, Raymond C.; cfe-dev at lists.llvm.org
> Subject: Re: [cfe-dev] pointer values, taint propogation
>
> Hello,
>
> While the analyzer tries to cover all sides of the C's "everything is everything" attitude (pointers are integers, integers are pointers, pointer types do not matter), it seems that casting integers to pointers is the least covered part of it.
>
>
> --= IntegerAsLoc values =--
>
> First, there is currently no accepted way to represent a cast from a
> *symbolic* integer to a pointer in the SVal hierarchy. For the opposite cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal sub-class to represent the result. However, nobody made a "loc::IntegerAsLoc" for us yet.
>
> I'd probably suggest to represent those as SymbolicRegion of an integer-type symbol. In fact, the following patch fixes all of your cases except sv2/sp2:
>
> ```
> diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> index 28b43dd..6e894f3 100644
> --- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> +++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
> @@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val, QualType castTy) {
>
> if (!isLocType)
> return makeNonLoc(se, T, castTy);
> - return UnknownVal();
> +
> + const SymbolicRegion *SR = MemMgr.getSymbolicRegion(se);
> + return loc::MemRegionVal(SR);
> }
> ```
>
> This produces "&SymRegion{conj_$0{long long}}", which was never done before and needs much more testing. Surprisingly, it doesn't even fail on tests (though i'd still suspect that it'd introduce a bunch of crashes on real codebases). I believe this is eventually the way to go, but i'd not be ready to accept this change right away without proper evaluation. I'm not sure when i'd be able to do this work.
>
> I also think that this trick is better than representing pointers with unsigned integers the way you did it, because consistently representing pointer values with Loc-type SVal's is, in my experience, a very good contract to follow; there are a lot of useful assertions around that.
>
>
> --= Why sv3/sp3 works? =--
>
> This case is easier than the others because it doesn't produce any casts, but works through pointer arithmetic, which is well-supported.
> Essentially, "sv3 += ll" is treated similarly to "sv3[ll]", which produces an ElementRegion inside SymbolicRegion. Because ElementRegion's index ll = conj_$0<long long> is tainted, the whole SVal is considered tainted.
>
>
> --= What's so much wronger with sv2/sp2? =--
>
> The difference is that it involves adding an integral symbol to a 0 (Loc); this operation itself yields UnknownVal. I've no idea why is this simple operation unsupported (UnknownVal essentially always means unsupported), but i suspect that it's because we'd need a pointer type to conduct it correctly (multiply the shift by pointee type size), which is not stored in 0 (Loc) itself. Adding pointer type here would take some work.
>
>
> --= Why so much inconsistency? =--
>
> Because static symbolic execution is so much harder than compilation or "concrete" interpretation, but on the other hand it doesn't require absolute correctness, it's natural that some operations will be simplified out until somebody really needs them to be supported. There are just so many cases to cover, and a lot of them require extending the SVal algebra with new operations and new value kinds. So we're bound to see UnknownVal's from time to time.
>
> Another layer of inconsistency - not sure if it affected you - is that often UnknownVal's are replaced with SymbolConjured values, which happens whenever somebody finds out that he badly needs to recover path-sensitivity in his specific example. There's no consensus if symbolicating unknowns is generally a good thing.
>
> Once you notice symbolication, you can hack upon it by propagating taint towards the newly conjured symbol in a checker callback. This would work, but it'd be harder to propagate taint removal ("the symbol is already verified, no longer to keep it tainted") back to the original symbol.
>
>
> On 2/15/17 3:55 PM, McDowell, Raymond C. via cfe-dev wrote:
>> I didn’t realize the screen captures would make things difficult.
>> Here’s a version of the e-mail with text instead.
>>
>> I would like to write a checker to make sure that pointers from
>> untrusted sources are not dereferenced. So I am playing around with
>> the alpha.security.taint checker to try to understand how taint
>> propagation works. I put together some simple test cases:
>>
>> #include <stdio.h>
>>
>> #include <string.h>
>>
>> void test(float f, char *sp1, char *sp2, char *sp3) {
>>
>> char *sv1, *sv2, *sv3;
>>
>> long long ll;
>>
>> scanf("%lld", &ll); // ll is read from stdin, so is untrusted
>>
>> if (ll) {
>>
>> printf((char *)ll, f); // flag: ll is untrusted
>>
>> sv1 = (char *)ll;
>>
>> printf(sv1, f); // flag: sv1 = ll is untrusted
>>
>> sp1 = (char *)ll;
>>
>> printf(sp1, f); // flag: sp1 = ll is untrusted
>>
>> printf((char *)NULL+ll, f); // flag: ll is untrusted
>>
>> unsigned long u = 0;
>>
>> u += ll;
>>
>> printf((char *)u, f); // flag: u = 0+ll is untrusted
>>
>> sv2 = NULL;
>>
>> sv2 += ll;
>>
>> printf(sv2, f); // flag: sv2 = NULL+ll is untrusted
>>
>> sp2 = NULL;
>>
>> sp2 += ll;
>>
>> printf(sp2, f); // flag: sp2 = NULL+ll is untrusted
>>
>> printf(sp3, f);
>>
>> sv3 = sp3;
>>
>> sv3 += ll;
>>
>> printf(sv3, f); // flag: sv3 = sp3+ll is untrusted
>>
>> sp3 += ll;
>>
>> printf(sp3, f); // flag: sp3 = sp3+ll is untrusted
>>
>> }
>>
>> }
>>
>> The indicated calls to printf should be flagged because an untrusted
>> value is passed as the format string (the first argument). When I run
>> the alpha.security.taint checker on this, it only flags the last two.
>> Looking into this, I found that the static analyzer often assigns an
>> unknown value to the result of casting from an integer to a pointer.
>> Taint can’t be attached to an unknown value, so I need a known value
>> for the pointer. Although I’m not sure if this is really what we want
>> to do, for now I added a post-check call-back on cast expressions to
>> my checker that casts the integer to unsigned and then assigns the
>> result as the pointer value.
>>
>> void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE,
>> CheckerContext &ChCtx) const {
>>
>> ProgramStateRef St = ChCtx.getState();
>>
>> const LocationContext *LCtx = ChCtx.getLocationContext();
>>
>> if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE,
>> LCtx).isUnknown()) {
>>
>> // if cast to pointer resulted in an unknown value, try casting to
>> unsigned instead
>>
>> const Expr *SubExpr = CE->getSubExpr();
>>
>> QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;
>>
>> QualType SubExprTy = SubExpr->getType();
>>
>> SVal Val = St->getSVal(SubExpr, LCtx);
>>
>> Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val,
>> CastTy, SubExprTy);
>>
>> ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));
>>
>> }
>>
>> }
>>
>> When I run alpha.security.taint together with this on the test file,
>> it flags the calls to printf where (char *)l and (char *)u are passed,
>> but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are
>> passed, even though these are just different expressions with the same
>> value. I tried looking into this using the debugger, and found that
>> when the analyzer calls ExprEngine::evalStore for the assignment
>> statement (say “sv1 = (char *)ll”), it gives the pointer the
>> appropriate value, but when the analyzer calls
>> GenericTaintChecker::checkUncontrolledFormatString from the pre-check
>> call-back for the call to printf on the following line of the test
>> file, the pointer has an UnknownVal. (See excepts from the debugging
>> session below.)
>>
>> Can anyone help me understand what is going on here?
>>
>> Thanks,
>>
>> Ray
>>
>> (These debugger session excepts are inherently screen captures, but
>> the gist of them is described above.)
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list