[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Tue Oct 15 07:19:08 PDT 2024

NagyDonat wrote:

@steakhal I'm sorry that I disappeared for more than a week when you dropped your alternative implementation. I thought a lot about the advantages and limitations of my implementation, your implementation and even "third way" alternatives (but unfortunately `check::BranchCondition` doesn't seem to be useful for this), but finally (after a discussion with @Szelethus ) I decided that I would prefer to keep my approach.

I was tempted do switch to the visitor-based implementation because its code quality is simply better, but overall I feel that this direct approach would be even better if I clean it up properly.

My main considerations are:
- I think it's likely (although not guaranteed) that this heuristic would be helpful for other checkers as well, and if we want to activate it for all checkers, then it must be done in an eager way (because eagerly sinking lots of paths is significantly better than lazily discarding all the results).
- I agree with @Szelethus that this heuristic is handling an engine issue, so its logic should be in the engine itself.
- The `OrigCursor` trickery is not unacceptable by itself, but the added complexity is one disadvantage of the visitor approach.
- The "enumerate all subexpressions of the loop condition and check the number of times they were assumed to be true/false" approach can handle more of my testcases than my original implementation, but it can produce very counter-intuitive results:
  - If we have a loop like `while (!seen_errors(output, logging_mode() < LOG_DEBUG))` with eager assumptions and `logging_mode()` is defined in a different TU, then the "count true/false evaluations of subexpression" logic says that this loop makes an assumption at each iteration -- even if `seen_errors` is inlined and e.g. the analyzer knows that it always returns false.
  - There are also situations where not all evaluations of the loop condition evaluate a particular subexpression, which would also produce strange results.
  - Logical negation operations could swap the "real" meaning of "number of times it's assumed to be true" vs "number of times it's assumed to be false" counters.

@steakhal @ anybody else Is my eager implementation acceptable for you (in theory)? If yes, I'll start to work on implementing the suggestions of @isuckatcs and any other observations.

https://github.com/llvm/llvm-project/pull/109804