[cfe-dev] General query : Alpha security checkers and taint analysis

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Apr 5 02:56:48 PDT 2016

 > For example,
 >   x = getchar();
 >   char y = x + 1;
 > Which part of the code taints y?

Propagation of taint through the symbol hierarchy is done by the core 
automatically. In fact, no propagation is done in any continuous manner 
- the core just looks at the symbol that you're interested in and finds 
tainted sub-symbols inside it. This mechanism is implemented in the 
ProgramState::isTainted() methods, and it relies on the assumption that 
symbols on which the taint originally appears are always atomic (of 
SymbolData class).

In your particular example, the following happens:
1. getchar() returns a SymbolConjured - an atomic symbol that represents 
the return value. Technically, it returns an SVal of nonloc::SymbolVal 
class, but it is a simple wrapper around the symbol, so there isn't much 
difference. If you dump() the program state, you'd see it as something 
like "conj_$0<int>".
2. The conjured symbol is stored in the memory region (VarRegion) that 
represents AST variable 'x' in the analyzer's memory model. If you 
dump() the program state, you'd see a binding in the Store: "(x, 0, 
direct): conj_$0<int>".
3. In order to compute 'x + 1', the conjured symbol "conj_$0<int>" is 
loaded as (r)value of expression 'x'.
4. A SymIntExpr - a symbolic expression 'conj_$0<int> + 1' is stored in 
the memory region of variable 'y'.
5. Suppose then you ask if value of 'y' is tainted. Then symbol 
'conj_$0<int> + 1' is taken to represent the value of 'y'.
6. In order to see if the value is tainted, 
ProgramState::isTainted(SymbolRef Sym, ...) iterates over all 
sub-symbols of the symbolic expression.
7. It finds 'conj_$0<int>' as one of such sub-symbols (and only; '1' is 
not a symbol).
8. Seeing that 'conj_$0<int>' was marked as tainted by the 
TaintPropagation checker, it decides that the whole symbol is tainted. 
Therefore it reports that value of the expression 'y' is tainted.

More information about the cfe-dev mailing list