[cfe-dev] Clang GenericTaintChecker limitations

Wed Aug 10 08:47:16 PDT 2016

Hello,

The taint analysis we have here is not perfect, but it's pretty sane.

The analyzer assigns symbols to memory regions to represent their values 
at a given moment of time, passes symbols around through assignments. 
Then, some symbols carry taint, and symbolic expressions composed with 
them automatically inherit the taint.

The GenericTaintChecker performs propagation of taint through functions. 
For example, if you put pointers to tainted values into strcat(), then 
the symbol that represents the value behind the returned pointer would 
also be tainted, but unlike propagation from atomic symbols to 
expressions, this is not automagical - a checker needs to do the work, 
to support any specific API.

The memory model currently assumes that there is no aliasing between 
unknown pointers, which is another limitation. However, inter-procedural 
analysis works through inlining: when the function is inlined, any 
aliasing between its actual arguments is correctly taken into account 
(eg. the call of `f(&x, &x)' is modeled correctly, even though f assumes 
that its arguments do not alias when analyzed separately).

One of the limitation that might bite you is lack of support for 
floating-point values - the analyzer doesn't yet symbolicate them, so 
they cannot be tainted.

One thing you'd probably need is to understand how structures are 
modeled - eg. there's a symbol for the structure or array and the symbol 
for its field or element, and there are multiple methods used for 
representing this relationship, depending on circumstances.

I'm not aware of any other powerful open-source static analysis tools 
for C/C++, but you might have a look at KLEE, which is not exactly 
static, but also implements symbolic execution.

You may want to check an earlier discussion:
http://lists.llvm.org/pipermail/cfe-dev/2016-April/048250.html
http://lists.llvm.org/pipermail/cfe-dev/2016-April/048243.html
http://lists.llvm.org/pipermail/cfe-dev/2016-April/048363.html

On 8/10/16 4:27 PM, Divya Muthukumaran via cfe-dev wrote:
> Hi All,
>
> I am looking for an open source static taint analysis tool that I can 
> run on some applications to reason about security properties -- just 
> to check if a tainted value can flow to some function parameters etc. 
> The programs I want to try this on are around 10-20K lines of C code. 
> I was thinking of using Clang's GenericTaintChecker (and just 
> modifying the taint sources) for this purpose. I'd like to know if 
> there are any limitations to this analysis that I should be aware of.
>
> I know that the interprocedural analysis doesn't work across 
> translation units, but I'v managed to merge my source files using the 
> cilly tool. I was mainly wondering about the precision of the taint 
> analysis (what sort of pointer/alias analysis the IPA uses etc). If 
> you could point me to any documentation that discusses the memory 
> model, that would be great.
>
> Is the clang taint checker considered the state-of-the-art in 
> open-source taint checking tools or is there something that is 
> considered better (more precise)?
>
> Thanks,
> Divya Muthukumaran
> Research Associate
> Department of Computing
> Imperial College London