[all-commits] [llvm/llvm-project] 5ffd15: [analyzer] Clean up list of taint propagation func...

Donát Nagy via All-commits all-commits at lists.llvm.org
Thu May 16 03:11:21 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 5ffd154cf61390c1ed32a1b0eb134929f78c0fbe
      https://github.com/llvm/llvm-project/commit/5ffd154cf61390c1ed32a1b0eb134929f78c0fbe
  Author: Donát Nagy <donat.nagy at ericsson.com>
  Date:   2024-05-16 (Thu, 16 May 2024)

  Changed paths:
    M clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp

  Log Message:
  -----------
  [analyzer] Clean up list of taint propagation functions (#91635)

This commit refactors GenericTaintChecker and performs various
improvements in the list of taint propagation functions:

1. The matching mode (usually `CDM::CLibrary` or
`CDM::CLibraryMaybeHardened`) was specified to avoid matching e.g. C++
methods or functions from a user-defined namespace that happen to share
the name of a well-known library function.
2. With these matching modes, a `CallDescription` can automatically
match builtin variants of the functions, so entries that explicitly
specified a builtin function were removed. This eliminated
inconsistencies where the "normal" and the builtin variant of the same
function was handled differently (e.g. `__builtin_strlcat` was covered,
while plain `strlcat` wasn't; while `__builtin_memcpy` and `memcpy` were
both on the list with different propagation rules).
3. The modeling of the functions `strlcat` and `strncat` was updated to
propagate taint from the first argument (index 0), because a tainted
string should remain tainted even if we append something else to it.
Note that this was already applied to `strcat` and `wcsncat` by commit
6ceb1c0ef9f544be0eed65e46cc7d99941a001bf.
4. Some functions were updated to propagate taint from a size/length
argument to the result: e.g. `memcmp(p, q, get_tainted_int())` will now
return a tainted value (because the attacker can manipulate it). This
principle was already present in some propagation rules (e.g.
`__builtin_memcpy` was handled this way), and even after this commit
there are still some functions where it isn't applied. (I only aimed for
consistency within the same function family.)
5. Functions that have hardened `__FOO_chk()` variants are matched in
`CDM:CLibraryMaybeHardened` to ensure consistent handling of the
"normal" and the hardened variant. I added special handling for the
hardened variants of "sprintf" and "snprintf" because there the extra
parameters are inserted into the middle of the parameter list.
6. Modeling of `sscanf_s` was added, to complete the group of `fscanf`,
`fscanf_s` and `sscanf`.
7. The `Source()` specifications for `gets`, `gets_s` and `wgetch` were
ill-formed: they were specifying variadic arguments starting at argument
index `ReturnValueIndex`. (That is, in addition to the return value they
were propagating taint to all arguments.)
8. Functions that were related to each other were grouped together. (I
know that this makes the diff harder to read, but I felt that the full
list is unreadable without some reorganization.)
9. I spotted and removed some redundant curly braces. Perhaps would be
good to switch to a cleaner layout with less nested braces...
10. I updated some obsolete comments and added two TODOs for issues that
should be fixed in followup commits.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list