[cfe-dev] Purpose of GenericTaintChecker
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Fri Jun 3 11:02:56 PDT 2016
> What I'm trying to achieve is to check if any tainted variables has
been passed into sensitive functions.
The first "Aha!" here would be to realize that taint is not a property
of a variable - it is a property of the value stored in it, and the
analyzer's core engine allows you to easily work with values directly,
without spending any effort to compute these values.
The analyzer denotes values which are not known during static analysis
(such as values coming from user input) with *symbols* and performs
algebraic operations on symbols. During program execution (or,
equivalently, during analysis, a.k.a. "symbolic execution"), those
symbols are passed around from one variable to another (through
assignments etc. - that is, for instance, after declaration statement
"int a = b;" both variables 'a' and 'b' hold the same symbol). Results
of algebraic operations on tainted symbols are also considered to be
tainted. Symbols read from tainted pointers are considered to be tainted
themselves, etc.
GenericTaintChecker, aka alpha.security.taint.TaintPropagation as it's
called in Checkers.td, is subscribed on certain function call events -
such as, say, getc(). Their return values (etc. - say for scanf() it's
values written into pointers passed as arguments) are denoted as symbols
by the core. GenericTaintChecker takes these symbols and marks them as
tainted.
Then the analyzer core models how these symbols move around during
execution. No checker is responsible for that - it's done automagically.
The core doesn't, most of the time, care if these symbols are tainted or
not - it simply models operations on them. It makes no additional effort
to mark results of algebraic operations on tainted values as tainted -
it can compute taint of an algebraic symbolic expression by simply
looking at the expression (if it references any tainted symbols). Same
happens to symbols loaded from tainted pointers - *the hierarchy of
symbols is designed to remember each symbol's origins in an out of the
box manner*, so it's easy to see if any composite symbols are coming
from a tainted source.
Whenever core encounters calls to other functions, which it doesn't
model (say, because their bodies aren't available), their return values
are not tainted even if arguments of the call are tainted: because
otherwise we'd get a lot of false positives. So in case when we need to
mark return values of functions as tainted depending on taintedness of
arguments, GenericTaintChecker is responsible for modeling that. This is
the "taint propagation" thing. For instance, taint propagates through
strcat(), which allows us to theoretically catch SQL injections.
Finally, tainted symbols may reach sensitive functions. For example,
tainted input string in call to system() allows execution of arbitrary
code. This is the *third* kind of functions on which GenericTaintChecker
is subscribed - upon noticing tainted arguments passed to such
functions, it issues warnings.
If you want to extend this functionality by adding your own:
(1) Taint sources,
(2) Taint propagation rules,
(3) Warnings for tainted value usage,
Then you can either extend the relevant section of GenericTaintChecker,
or write your own checker - it doesn't really matter, because taint
information is visible to all checkers. It might be more comfortable to
extend GenericTaintChecker because it allows some code re-use. If you
write your own taint checker, you can either use it together with
GenericTaintChecker (its work on taint sources and taint propagation may
be of use) or disable GenericTaintChecker completely (say, if you don't
want to see its warnings).
More information about the cfe-dev
mailing list