[cfe-dev] [analyzer][taint] More precise taint modelling on arrays

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Feb 11 08:27:43 PST 2020

It's not much of a limitation, just a bug. When `scanf` produces a 
default conjured symbol binding in the buffer, this is *the* symbol that 
should carry the taint; all derived symbols will automatically be 
treated as tainted. If we're being even more precise and consider the 
length of the scanned string, then the derived symbol will need to carry 
partial taint (i.e., specify TaintedSubRegions such that derived<conj, 
R> is tainted iff R is a TaintedSubRegion of buf for conj). I.e., what 
i'm saying is that RegionStore is more-or-less flexible enough to 
implement this correctly, it's only a matter of getting this done. Those 
string functions are just too many and each of them requires individual 

On 2/10/20 2:43 PM, Balázs Benics via cfe-dev wrote:
> I made the following test case for checking the modeling of taint 
> propagation on the `strcpy` function.
> As I observed, only the first byte of the array became tainted, even 
> though all bytes should be treated tainted.
> In the test, you can see my expectations and the actual result.
> ```
> void strcpy_unbounded_tainted_buffer(char *buf) {
>   scanf("%s", buf);
>   char dst[32];
>   strcpy(dst, buf);                       //  expected---vvv   vvv--- 
> actual
>   clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(dst[31]); // expected-warning{{YES}} NO
> }
> void strcpy_bounded_tainted_buffer(char *buf) {
>   scanf("%s", buf);
>   buf[10] = '\0';
>   clang_analyzer_isTainted_char(buf[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(buf[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(buf[10]); // expected-warning{{NO}}  NO
>   clang_analyzer_isTainted_char(buf[20]); // expected-warning{{YES}} NO
>   char dst[32];
>   strcpy(dst, buf);
>   clang_analyzer_isTainted_char(dst[0]);  // expected-warning{{YES}} YES
>   clang_analyzer_isTainted_char(dst[1]);  // expected-warning{{YES}} NO
>   clang_analyzer_isTainted_char(dst[10]); // expected-warning{{NO}}  NO
>   clang_analyzer_isTainted_char(dst[20]); // expected-warning{{NO}}  NO
> }
> ```
> Some clarification about `TaintedSubRegions` and tainting 
> `nonloc::LazyCompoundVal`s would be also helpful since it might be 
> related to this topic.
> What are the reasons for this limitation on modeling taintedness 
> regarding arrays?
> Background and expectation:
> This change would be the first step in migrating the diagnostic 
> emitting parts of the `GenericTaintChecker`.
> Eg.: `checkUncontrolledFormatString`, `checkSystemCall`, 
> `checkTaintedBufferSize`.
> As a result, multiple checkers will consume taintedness information 
> for reporting warnings in the future and letting the 
> `GenericTaintChecker` do only modeling and propagation.
> Regards, Balazs.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

More information about the cfe-dev mailing list