[cfe-dev] [analyzer] Problem tracking taint applied to regions
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Tue Mar 20 15:05:06 PDT 2018
Whoops +cfe-dev.
On 3/20/18 3:01 PM, Artem Dergachev wrote:
> Hello Julian,
>
> You seem to have made your way very far on your own, but it doesn't
> look to me like you're on the right track.
>
> On 3/20/18 2:34 AM, Julian Ganz via cfe-dev wrote:
>> Hello cfe-dev,
>>
>> I am currently trying to write a custom checker for the static analyzer.
>>
>> The idea is to perform taint analysis on C++ code with sources, sinks
>> and filters being provided by configuration. A checker which does
>> something similar exist already,e.g. [1]. However, in our use-case,
>> we want to mark data/values (rather than functions) as sources and
>> sinks.
>> Consider for example a piece of security-relevant configuration data
>> which must not be writable by an potential attacker. For now, I use
>> the following example for testing:
>>
>> | class Foo {
>> | public: int x; // Foo::x marked as tainted (e.g. source)
>> | };
>> | static int y; // Marked as sink
>> |
>> | int main() {
>> | Foo f;
>> | y = f.x;
>> | return y;
>> | }
>>
>> Now, clang already provides some infrastructure for taint analysis,
>> e.g. `ProgramState::addTaint()` and `ProgramState::isTainted()`.The
>> internal taint map used in the store tracks symbolic expressions. I
>> decided to use this existing infrastructure for obvious reasons, e.g.
>> benefitting from existing and future taint propagation
>> infrastructure. However, it bit me and now I'm stuck.
>>
>> The taint is introduced in a `checkLocation()` implementation. The
>> first thing I do is unwrapping a `MemberExpr` (e.g. removing casts)
>> and decide whether to taint it or not based on the referred
>> declaration. This far, it works as intended. However, it appears that
>> the `SVal loc` provided as a parameter to this handler does not refer
>> to a `SymExpr`, neither does `loc.getAsRegion()` return a
>> `SymbolicRegion`. In order to taint the piece of data, I create my
>> own symbolic expression as well as an associated `SVal` and bind it
>> to the member expression `S`:
>> | // The taint map can only be used for tracking symbolic expressions.
>> | auto symExpr = loc.getAsSymExpr();
>> |
>> | if (!symExpr) {
>> | // The current `loc` is not suitable for carrying taint.
>> Construct a new one.
>> | if (const auto region =
>> dyn_cast_or_null<TypedValueRegion>(loc.getAsRegion())) {
>> | if (symExpr = C.getSymbolManager().getRegionValueSymbol(region)) {
>
> You should not create your own symbols in the checker. At least, not
> until you 100% know what you are doing.
>
> It's like the analyzer says "i know that there are two sheep" and you
> later say "let $x denote the number of sheep" - you should just take
> the value that's already there instead. Just worse because you use a
> specific notation (SymbolRegionValue) rather than a nameless "x",
> which matters (unlike school algebra).
>
> There is already a correct SVal (which is not necessarily a symbol) in
> the program state that denotes the value of the region; you should
> retrieve this value by using the State->getSVal(Region, QualType) API.
> If there isn't such value yet, it will be denoted with a symbol
> automatically and presented to you.
>
>> | loc = C.getSValBuilder().makeLoc(symExpr);
>
> The symbol you've created is an integer. It's not a loc-type symbol,
> i.e. not a pointer or a reference or an lvalue thing. This operation
> should ideally fail.
>
>> | state = state->BindExpr(S, C.getLocationContext(), loc);
>
> You are changing the value of the syntactic expression (in the
> "Environment"), but not changing the value in the symbolic memory
> model (the "Store"). Well, ideally, the checker should do neither.
>
> Replacing values of expressions in the Environment is a catastrophic
> thing to do - they're essentially immutable by design (though
> unfortunately we have a few places in the analyzer where we still do
> that). Environment is "spacial", not "temporal": it captures a single
> moment in time and explores values of expressions in that moment of
> time, and these values are not allowed to change because any change
> assumes some sort of flow of time.
>
> Replacing the value stored in a region in the Store is an equivalent
> of saying "the program has modified contents of memory at this point",
> which is also not what you're looking for, and it's risky to do unless
> you take full responsibility of modeling the statement (for now
> checkers can only take such responsibility for call expressions and
> for branch conditions); otherwise checkers that subscribed on the
> callback before you would see the old value, and checkers that go
> later would see the new value.
>
>> | }
>> | }
>> | }
>> |
>> | // Add taint if possible
>> | if (symExpr) {
>> | state = state->addTaint(symExpr);
>> | C.addTransition(state);
>> | }
>>
>> Invoking `dumpTaint()` on ` state` shows that the expression did
>> indeed make it into the taint map, `state->isTainted(S,
>> C.getLocationContext())` does return `true` and `state->dump()`
>> reveals that `f.x` is now bound to a symbolic expression looking just
>> like I'd expect (`f.x : &SymRegion{reg_$1<int f->x>}SVal `).
>
> "SymRegion{$x}" means "an unidentified segment of memory that starts
> at address $x". That's definitely not what you want because
> "reg_$1<int f->x>" is an unknown value of type "int" that can't denote
> any reasonable memory address.
>
> I added an assertion recently that would have informed you of this
> problem if you used a more recent clang.
>
> Note that symbols, much like expressions, always have a type which is
> a valid QualType from the AST of the program.
>
>> Btw: I did not use a `check::PreStmt` specialized to a `MemberExpr`
>> because I do not (yet) understand the exact semantics of the ` SVal
>> `s and `Loc`s which I can get from the state (partly because of the
>> utter lack of higher-level documentation on these things). They
>> appear to be different: the code above does fail in a
>> `checkPreStmt()`, one appears to refer to the location where the
>> value is originating while some other appears to refer to something
>> completely different although the method's name suggests its just the
>> same.
>>
>> Using the information in subsequent post-statement checks is where
>> I'm stuck. First, I noted that the taint is not propagated anywhere.
>> I assumed that simple assignments were, for whatever reason, not yet
>> considered (I also did not find anything in the generic taint
>> checker), so I started implementing a
>> `check::PostStmt<BinaryOperator>` checker. Since `isTainted()` on the
>> statement behaved as expected on the last handler, I assumed the
>> taint detection to be somewhat easier. Sure, I have to strip both the
>> LHS and RHS of casts and paranthesis, but detecting taint should be
>> easy, I thought. Turns out `isTainted()` never returns true.
>>
>> Now, dumping both the taint and the overall state using
>> `ProgramState::dump()` and `ProgramState::dumpTaint()` reveals that
>> the symbolic expression which was previously added to the taint map
>> is still tainted, but the binding of this expression to `f.x` is
>> gnone (`f.x: Undefined`).
>
> "Undefined" is not "gone"; it's an indication that something is indeed
> undefined, as in, the analyzer has identified some undefined behavior
> that definitely occurs in the program under analysis on the current
> execution path which leads to something (in our case, a value of an
> expression) being potentially equal to any arbitrary value.
>
> The lack of specific value would have been represented as "Unknown",
> not "Undefined". Or, in case of Environment, the binding would have
> disappear completely.
>
> In your example f.x is indeed undefined because the class has no
> constructor and therefore the local variable will have uninitialized
> contents. Undefined values are used by the analyzer for finding
> related bugs - for instance you would have seen the bug in your
> example if you ran the vanilla (i.e. unmodified) analyzer on it:
>
> $ clang --analyze test.cpp
> test.cpp:8:5: warning: Assigned value is garbage or undefined
> y = f.x;
> ^ ~~~
> 1 warning generated.
>
> So the UndefinedVal here is the "correct value" i've been talking
> about in the beginning. You'd have obtained it by calling getSVal().
> The analyzer can prove to you that the field has undefined contents,
> just by looking at your code. Taint analysis is not necessary in this
> case: you already know that the value is completely undefined. Which
> is why it's not even a symbol: it is pointless to denote an undefined
> value with a symbol because any code that deals with such value would
> be buggy and we need to terminate the analysis and emit a warning as
> soon as the value is used at all.
>
> So you'd need a better test case to work with.
>
> Generally the whole idea of "certain fields are the source of
> taint"doesn't sound good to me . In C++ fields are not properties; if
> you put a value into them and don't touch the object otherwise, you're
> guaranteed to read back the same value. It means that fields are not a
> good source of taint: they merely store whatever you put into them.
> You should not report taint if someone writes 0 to the field and then
> reads it.
>
> I'd strongly suggest considering identifying sources of the values
> stored in these fields that cause these fields to contain tainted
> values - instead of blaming fields themselves. For example:
>
> - A function returned an object of type Foo. Upon return, immediately
> mark the symbolic value of its field x as symbolic. Don't wait until
> the read because it may be preceded with a write.
>
> - A method of Foo sets the object's field x to a tainted value. Upon
> method call, immediately mark the symbolic value this object's field x
> as tainted.
>
> - A function is known to touch a specific global object of type Foo
> and set its field x to a tainted value. Upon call of that function,
> immediately mark the symbolic value of this object's field x as tainted.
>
> This is the intended/supported workflow, and i strongly suspect that
> the field-based approach wouldn't work well (i.e. will have
> unavoidable false positives).
>
>
>> Does anybody on this list have an idea or an explanation why the
>> binding vanishes? Or can naybody point me to move exhaustive
>> resources on the excact semantics of the different kinds of `SVal`s
>> turning up in different contexts? Help would be much appreciated.
>>
>> Btw: I'm stuck with LLVM/clang 4.0 provided by the package manager at
>> the moment. Self-compiled clang 6.0 refuses to load modules (cannot
>> resolve the `CheckerBase` symbol for some reason) although I ran
>> `cmake` with `-DCLANG_PLUGIN_SUPPORT=ON`,
>> `-DLLVM_BUILD_LLVM_DYLIB=ON` and `-DLLVM_ENABLE_MODULES=ON`.
>>
>> Greetings and thanks in advance,
>> Julian
>>
>> [1] https://github.com/franchiotta/taintchecker
>>
>> PS: Sorry for the missing line-wrapping. I did not yet manage to make
>> Outlook behave like a decent eMail-Client
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
More information about the cfe-dev
mailing list