[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 12 06:43:07 PDT 2023


llvmbot wrote:

@llvm/pr-subscribers-clang

<details>
<summary>Changes</summary>

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common pattern to copy user provided input to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block).

The change has been evaluated on the following open source projects:

- memcached: [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)

- tmux: 0 lost reports
- twin [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- vim [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- openssl 0 lost reports
- sqliste [2 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline&newcheck=sqlite_version-3.33.0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- ffmpeg 0 lost repots
- postgresql [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- tinyxml 0 lost reports
- libwebm 0 lost reports
- xerces 0 lost reports

In all cases the lost reports are originating from copying untrusted environment variables into another buffer.

There are 2 types of lost false positive reports:
1)  [Where the warning is emitted at the malloc call by the TaintPropagation Checker ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648506&report-hash=2079221954026f17e1ecb614f5f054db&report-filepath=%2amemcached.c)
`
            len = strlen(portnumber_filename)+4+1;
            temp_portnumber_filename = malloc(len);
`

2) When pointers are set based on the length of the tainted string by the ArrayOutofBoundsv2 checker.
For example [this ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c)case.


--
Full diff: https://github.com/llvm/llvm-project/pull/66086.diff

4 Files Affected:

- (modified) clang/docs/analyzer/checkers.rst (+2-2) 
- (modified) clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp (-2) 
- (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+7-6) 
- (modified) clang/test/Analysis/taint-generic.c (-18) 


<pre>
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
  ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
  ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 3dcb45c0b110383..9df69c1ad1b525e 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -694,8 +694,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
       {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-      {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-      {{{"strnlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"strtol"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
       {{{"strtoll"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
       {{{"strtoul"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 663836836d3db67..1eb926f25f9a778 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -10,6 +10,7 @@ int scanf(const char *restrict format, ...);
 int system(const char *command);
 char* getenv( const char* env_var );
 size_t strlen( const char* str );
+int atoi( const char* str );
 void *malloc(size_t size );
 void free( void *ptr );
 char *fgets(char *str, int n, FILE *stream);
@@ -54,11 +55,11 @@ void taintDiagnosticVLA(void) {
 // propagating through variables and expressions
 char *taintDiagnosticPropagation(){
   char *pathbuf;
-  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
                                  // expected-note at -1 {{Taint propagated to the return value}}
-  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+  if (size){ // expected-note {{Assuming 'size' is non-null}}
 	               // expected-note at -1 {{Taking true branch}}
-    pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+    pathbuf=(char*) malloc(atoi(size)); // expected-warning{{Untrusted data is used to specify the buffer size}}
                                                 // expected-note at -1{{Untrusted data is used to specify the buffer size}}
                                                 // expected-note at -2 {{Taint propagated to the return value}}
     return pathbuf;
@@ -71,12 +72,12 @@ char *taintDiagnosticPropagation(){
 char *taintDiagnosticPropagation2(){
   char *pathbuf;
   char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
-  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
                                  // expected-note at -1 {{Taint propagated to the return value}}
   char *user_env=getenv("USER_ENV_VAR");//unrelated taint source
-  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+  if (size){ // expected-note {{Assuming 'size' is non-null}}
 	               // expected-note at -1 {{Taking true branch}}
-    pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
+    pathbuf=(char*) malloc(atoi(size)+1); // expected-warning{{Untrusted data is used to specify the buffer size}}
                                                 // expected-note at -1{{Untrusted data is used to specify the buffer size}}
                                                 // expected-note at -2 {{Taint propagated to the return value}}
     return pathbuf;
diff --git a/clang/test/Analysis/taint-generic.c b/clang/test/Analysis/taint-generic.c
index b7906d201e4fad3..a614453c63af671 100644
--- a/clang/test/Analysis/taint-generic.c
+++ b/clang/test/Analysis/taint-generic.c
@@ -915,24 +915,6 @@ void testStrndupa(size_t n) {
   clang_analyzer_isTainted_charp(result); // expected-warning {{YES}}
 }
 
-size_t strlen(const char *s);
-void testStrlen() {
-  char s[10];
-  scanf("%9s", s);
-
-  size_t result = strlen(s);
-  clang_analyzer_isTainted_int(result); // expected-warning {{YES}}
-}
-
-size_t strnlen(const char *s, size_t maxlen);
-void testStrnlen(size_t maxlen) {
-  char s[10];
-  scanf("%9s", s);
-
-  size_t result = strnlen(s, maxlen);
-  clang_analyzer_isTainted_int(result); // expected-warning {{YES}}
-}
-
 long strtol(const char *restrict nptr, char **restrict endptr, int base);
 long long strtoll(const char *restrict nptr, char **restrict endptr, int base);
 unsigned long int strtoul(const char *nptr, char **endptr, int base);
</pre>

</details>

https://github.com/llvm/llvm-project/pull/66086


More information about the cfe-commits mailing list