[cfe-dev] Adding taint sources to GenericTaintChecker

Mon Apr 11 05:32:50 PDT 2016

 > int readval()
 > {
 >   return 10;
 > }
 >
 > int a,b;
 > a = readval() // warning : tainted
 > b = a+1  //warning : tainted

In your example, readval() returns 10. Our analysis is inter-procedural, 
so it knows such things.

10 is a concrete value. A concrete value cannot be tainted - an attacker 
cannot forge 10 to become 20, or something like that. It's just "the" 
10, and all 10's are the same. Something is tainted if it's a user input 
or is anyhow known to be able to take completely arbitrary values; 10 is 
not an input from the user, and it's quite under our control. So the 
analyzer knows for sure that readval() returns a value that cannot be 
tainted, and the message from the checker gets ignored - this is 
expressed by the fact that the analyzer was unable to obtain a symbol 
from the value provided by the checker, because the value is concrete.

In fact, only *symbols* may be "truly" tainted. To be exact, addTaint() 
works with SymExpr's (SymbolRef's) and, additionally, SymbolicRegion's 
(which are essentially regions pointed to by SymExpr pointers). 
isTainted() works on SymExpr's, SymbolicRegion's and their sub-regions, 
and additionally on SVal's of class nonloc::SymbolVal, 
loc::MemRegionVal, nonloc::LocAsInteger whenever they contain a SymExpr 
or a SymbolicRegion or its sub-region.

If i replace your definition of readval() with an opaque forward 
declaration, eg:

   int readval();
   void foo() {
     int a = readval() // warning : tainted
   }

then everything works as expected.

On the other hand, if the definition of readval() is truly available in 
your translation unit, then you don't need to add *it* to 
GenericTaintChecker - instead, add whatever readval() calls to obtain 
the user input, and the analyzer would model readval() itself and pass 
the symbol down to the caller.