[clang] [clang-tools-extra] RFC: [clang-tidy] [analyzer] Nondeterministic pointer usage improvements (PR #110471)

Mon Sep 30 14:19:57 PDT 2024

vabridgers wrote:

> I wish we had something like this. It might be possible, but really really challenging to implement. To me the problem is that the iteration or lookups are not necessarily bad. And to determine if it's bad one needs to understand how the result of the lookup or the individual elements of the iteration are used.
> 
> For example, this is completely fine and efficient:
> 
> ```c++
> int numInterestingExprs = 0; // TODO: Use accumulate.
> for (const Expr *E : MyBlocks/*some unordered set-like container*/) {
>   if (isInteresting(E)) ++numInterestingExprs;
> }
> ```
> 
> But this is not fine, because we apply a sideffect that is dependent on the nondeterministic iteration order. In some context, this could be still fine btw. It depends on what your constraints are:
> 
> ```c++
> for (const Expr *E : MyBlocks/*some unordered set-like container*/) {
>   if (isInteresting(E)) llvm::errs() << E->printPretty() << "\n"; // eeeh, the lines may get reshuffled from run-to-run...
> }
> ```
> 
> Only looking at the for loop here is not enough to decide this.
> 
> Limiting the check to only trigger if it sees the iteration AND also the use-site - like in the case you match std algorithms, like `is_sorted` or `partition` is a much more careful approach. It should work unless the algorithm takes a callback/functor that we can't really reason about and we could end up with a FP.
> 
> Remember that `begin` might be an ADL free-function, and not a memer function, such as for `int *arr[10]`. C++ also has `std::ranges::*`, which we may wanna cover too in the future.
> 
> You mentioned that you evaluated the checker at scale. What was the result of the evaluation? I assume you have seen FPs, and TPs as well.

Hi @steakhal , I completed a test run on the LLVM source base and detected no issues with this check. I will be expanding the tests, and wanted to mention I just found 

> I wish we had something like this. It might be possible, but really really challenging to implement. To me the problem is that the iteration or lookups are not necessarily bad. And to determine if it's bad one needs to understand how the result of the lookup or the individual elements of the iteration are used.
> 
> For example, this is completely fine and efficient:
> 
> ```c++
> int numInterestingExprs = 0; // TODO: Use accumulate.
> for (const Expr *E : MyBlocks/*some unordered set-like container*/) {
>   if (isInteresting(E)) ++numInterestingExprs;
> }
> ```
> 
> But this is not fine, because we apply a sideffect that is dependent on the nondeterministic iteration order. In some context, this could be still fine btw. It depends on what your constraints are:
> 
> ```c++
> for (const Expr *E : MyBlocks/*some unordered set-like container*/) {
>   if (isInteresting(E)) llvm::errs() << E->printPretty() << "\n"; // eeeh, the lines may get reshuffled from run-to-run...
> }
> ```
> 
> Only looking at the for loop here is not enough to decide this.
> 
> Limiting the check to only trigger if it sees the iteration AND also the use-site - like in the case you match std algorithms, like `is_sorted` or `partition` is a much more careful approach. It should work unless the algorithm takes a callback/functor that we can't really reason about and we could end up with a FP.
> 
> Remember that `begin` might be an ADL free-function, and not a memer function, such as for `int *arr[10]`. C++ also has `std::ranges::*`, which we may wanna cover too in the future.
> 
> You mentioned that you evaluated the checker at scale. What was the result of the evaluation? I assume you have seen FPs, and TPs as well.

I completed an initial test run on the llvm/clang repo with no crashes or results at all. Meaning no FPs or TPs. I will be expanding the tests to more repos, and would love to hear suggestions if you have any. I suppose abseil would be a good one. Also just found @Xazax-hun 's testbench to help with this here - https://github.com/Xazax-hun/csa-testbench. Really looking forward to trying that :) 

Thanks for the comments. Will keep driving this forward.

https://github.com/llvm/llvm-project/pull/110471