[clang] [WIP] [analyzer] Refactor MallocChecker to use `BindExpr` in `evalCall` (PR #106081)

DonĂ¡t Nagy via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 27 09:02:31 PDT 2024


https://github.com/NagyDonat commented:

The commit looks very promising :smile: I added several minor remarks, but overall I'm very satisfied with the quality of your changes and I'm relieved that I won't have to perform this refactoring.

> I have only 2 failing tests now:
>
>  Clang :: Analysis/NewDelete+MismatchedDeallocator_intersections.cpp
>  Clang :: Analysis/NewDelete-intersections.m
>
> The problem with those is that now CSA reports read of undefined value obtained from malloc. I am not sure how to handle it and why it did work before this. Maybe it was related to wrong usage of BindExpr?

"Why it did work before this?" is always an excellent question :upside_down_face: and in this particular case I think I can answer it: previously `MallocChecker` only did a `postCall()` callback, so before that `free()` (whose definition is not available) was evaluated by `conservativeEvalCall()` and that process invalidated the memory region which was passed to `free()`, which changed its contents from `UndefinedVal` to an unknown symbolic value. Now you provide an `evalCall`, so `free` does not invalidate the value stored in its argument and you can see the `UndefinedVal` which was placed there by malloc.

To handle this, I think you should add an explicit invalidation step within the `evalCall` of freeing functions to signify that the previous knowledge about the contents of the released memory region is no longer valid.

More precisely I can imagine three alternatives:
- (1) A "regular" invalidation would introduce fresh symbols to represent the data readable through the now-invalid freed pointer. (At first we don't know anything about these symbols, but if they participate in conditionals, they can become constrained.) This is equivalent to the old behavior of the checker.
- (2) We could fill the region with `UnknownVal`, which is a "we're confused, assume the best about it" fallback placeholder. This is very similar to plan (1) but if I understand it correctly the `UnknownVal`s won't become constrained.
- (3) We could fill the region with `UndefinedVal`, which is a "this is garbage, reading it is an error" aggressive placeholder. In the testcases that you highlighted, this wouldn't change anything (because `malloc()` already fills the uninitialized memory with `UnknownVal`), but it would ensure that after code like
  ```c
    int* p = (int*)malloc(sizeof(int));
    *p = 123;
    free(p);
  ```
  the analyzer knows that `*p` is an undefined possibly-overwritten value (and not the concrete value 123).

Among these (3) is the closest representation of the language standard, because reading `free`d memory regions can produce arbitrary garbage.

However, if the reporting of use-after-free errors is enabled (within MallocChecker.cpp), then the difference between (1)-(3) is irrelevant, because use-after-free is checked as a precondition and we never reach the point where we actually read the contents of the released memory.

Therefore I think (1) (or perhaps 2??) would be the most user-friendly solution, because if the user enables some subcheckers of MallocChecker, but disables the use-after-free check for some reason, then they presumably wouldn't want to see the "use of undefined value" errors that are essentially use-after-free errors.

https://github.com/llvm/llvm-project/pull/106081


More information about the cfe-commits mailing list