[PATCH] D145229: [analyzer] Improve the documentation of the alpha.security.taint.TaintPropagation checker

Mon Jul 24 09:24:27 PDT 2023

dkrupp added a comment.

Thanks @donat.nagy for your review. I addressed your remarks. After patch https://reviews.llvm.org/D155848 these sanitizing examples work properly.

================
Comment at: clang/docs/analyzer/checkers.rst:78-80
+The ``SuppressAddressSpaces`` option suppresses
 warnings for null dereferences of all pointers with address spaces. You can
 disable this behavior with the option
----------------
donat.nagy wrote:
> Why is this paragraph (and the one above it) wrapped inconsistently? If we are touching these docs, perhaps we could re-wrap them to e.g 80 characters / line.
The formatting of this paragraph should not be impacted by this unrleated change. I will revert all unrelated formatting changes.

================
Comment at: clang/docs/analyzer/checkers.rst:2404
+  }
+  strcat(cmd, filename);
+  system(cmd); // Warning: Untrusted data is passed to a system call
----------------
donat.nagy wrote:
> If the filename is too long (more than 1014 characters), this is a buffer overflow. I admit that having a secondary unrelated vulnerability makes the example more realistic :), but I think we should still avoid it. (This also appears in other variants of the example code, including the "No vulnerability anymore" one.)
True. cmd buffer increased to 2048

================
Comment at: clang/docs/analyzer/checkers.rst:2457-2461
+  if (access(filename,F_OK)){//sanitizing user input
+    printf("File does not exist\n");
+    return -1;
+  }
+  csa_sanitize(filename); // Indicating to CSA that filename variable is safe to be used after this point
----------------
donat.nagy wrote:
> Separating the actual sanitization and the function that's magically recognized by the taint checker doesn't seem to be a good design pattern. Here `csa_sanitize()` is just a synonym for the "silence this checker here" marker, which is //very// confusing, because if someone is not familiar with this locally introduced no-op function, they will think that it's performing actual sanitization! At the very least we should rename this magical no-op to `csa_mark_sanitized()` or something similar.
> 
> The root issue is that in this example we would like to use a verifier function (that determines whether the tainted data is safe) instead of a sanitizer function (that can convert //any// tainted data into safe data) and our taint handling engine is not prepared to handle conditional Filter effects like "this function removes taint from its first argument //if its return value is true//".
> 
> I think it would be much better to switch to a different example where the "natural" solution is more aligned with the limited toolbox provided by our taint framework (i.e. it's possible define a filter function that actually removes problematic parts of the untrusted input).
I changed this fist example to be a data sanitation example, where the sanitizeFileName(..) function changes the user input to an empty string if the filneme is invalid. 

Then in the next example we show the generic csa_mark_sanitized() function and how it can be used to mark the valid code paths of verifier functions.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145229/new/

https://reviews.llvm.org/D145229