[cfe-dev] Purpose of GenericTaintChecker

Fri Jun 3 22:13:12 PDT 2016

________________________________
From: Jeremy 柯品任 <blackteam91 at hotmail.com>
Sent: Saturday, June 4, 2016 12:20:00 PM
To: Artem Dergachev
Subject: Re: [cfe-dev] Purpose of GenericTaintChecker

I don't know if I'm replying this the correct way, first time replying to the messages.

Currently I wrote a line of code of gets(x); inside my Objective C code in hopes to get GenericTaintChecker to prompt something but it doesn't, am I missing something out?

I've also tried writing some code in the checkPostStmt option to check if the CallExpr is tainted, if yes I'll emit the error report but to no avail.

Does the Static Analyzer process every statement in prestmt then poststmt etc? Not really sure how they are flowing right now and can't seem to print out the tainted areas which gets a little frustrating :\

If my own checker is enabled together with GenericTaintChecker, would the taint information that is input from GenericTaintChecker be available to my own checkers? There are some methods inside GenericTaintChecker that does addTaint.

Anyway thanks for your help!

________________________________
From: cfe-dev <cfe-dev-bounces at lists.llvm.org> on behalf of Artem Dergachev via cfe-dev <cfe-dev at lists.llvm.org>
Sent: Saturday, June 4, 2016 2:02:56 AM
To: cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] Purpose of GenericTaintChecker

> What I'm trying to achieve is to check if any tainted variables has
been passed into sensitive functions.

The first "Aha!" here would be to realize that taint is not a property
of a variable - it is a property of the value stored in it, and the
analyzer's core engine allows you to easily work with values directly,
without spending any effort to compute these values.

The analyzer denotes values which are not known during static analysis
(such as values coming from user input) with *symbols* and performs
algebraic operations on symbols. During program execution (or,
equivalently, during analysis, a.k.a. "symbolic execution"), those
symbols are passed around from one variable to another (through
assignments etc. - that is, for instance, after declaration statement
"int a = b;" both variables 'a' and 'b' hold the same symbol). Results
of algebraic operations on tainted symbols are also considered to be
tainted. Symbols read from tainted pointers are considered to be tainted
themselves, etc.

GenericTaintChecker, aka alpha.security.taint.TaintPropagation as it's
called in Checkers.td, is subscribed on certain function call events -
such as, say, getc(). Their return values (etc. - say for scanf() it's
values written into pointers passed as arguments) are denoted as symbols
by the core. GenericTaintChecker takes these symbols and marks them as
tainted.

Then the analyzer core models how these symbols move around during
execution. No checker is responsible for that - it's done automagically.
The core doesn't, most of the time, care if these symbols are tainted or
not - it simply models operations on them. It makes no additional effort
to mark results of algebraic operations on tainted values as tainted -
it can compute taint of an algebraic symbolic expression by simply
looking at the expression (if it references any tainted symbols). Same
happens to symbols loaded from tainted pointers - *the hierarchy of
symbols is designed to remember each symbol's origins in an out of the
box manner*, so it's easy to see if any composite symbols are coming
from a tainted source.

Whenever core encounters calls to other functions, which it doesn't
model (say, because their bodies aren't available), their return values
are not tainted even if arguments of the call are tainted: because
otherwise we'd get a lot of false positives. So in case when we need to
mark return values of functions as tainted depending on taintedness of
arguments, GenericTaintChecker is responsible for modeling that. This is
the "taint propagation" thing. For instance, taint propagates through
strcat(), which allows us to theoretically catch SQL injections.

Finally, tainted symbols may reach sensitive functions. For example,
tainted input string in call to system() allows execution of arbitrary
code. This is the *third* kind of functions on which GenericTaintChecker
is subscribed - upon noticing tainted arguments passed to such
functions, it issues warnings.

If you want to extend this functionality by adding your own:
(1) Taint sources,
(2) Taint propagation rules,
(3) Warnings for tainted value usage,
Then you can either extend the relevant section of GenericTaintChecker,
or write your own checker - it doesn't really matter, because taint
information is visible to all checkers. It might be more comfortable to
extend GenericTaintChecker because it allows some code re-use. If you
write your own taint checker, you can either use it together with
GenericTaintChecker (its work on taint sources and taint propagation may
be of use) or disable GenericTaintChecker completely (say, if you don't
want to see its warnings).
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160604/2f8875aa/attachment.html>