[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Wed Oct 16 06:09:23 PDT 2024

=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/109804 at github.com>

steakhal wrote:

> * I think it's likely (although not guaranteed) that this heuristic would be helpful for other checkers as well, and if we want to activate it for all checkers, then it must be done in an eager way (because eagerly sinking lots of paths is significantly better performance-wise than lazily discarding all those results).

Alright. So, if I understand you is not only to fix the OOBv2 checker but also to later expand the scope to the whole engine - and skip exploring paths that were explored before. I think that deserves a deeper discussion, but I also understand that we need implementation experience for experimentation back the RFC with data.
This is wasn't clear to me initially. The PR title and summary refereed to suppression, which didn't motivate this change well enough to touch the engine. Last time I've checked the Loop hanging RFC [discuss post](https://discourse.llvm.org/t/loop-handling-improvement-plans/80417/15), I though you finally decided to take @haoNoQ's suggestion - and suppress issues instead of changing exploration strategies (if I understood that correctly).

That said, if this patch only want's to suppress reports, then a bug report visitor is the right choice to me. If we shoot for something bigger (and this is just the first step), than it starts to make sense. However, achieving something like that is going to be really really tough, so the chances are slim. I wonder if you have plans for doing it incrementally and with options for disabling it while developing and preparing this new version.

> * I agree with @Szelethus that this heuristic is handling an engine issue, so its logic should be in the engine itself.

This point partially resonates with me that this is both a checker and engine issue, thus a hard and important problem to solve.

> * The `OrigCursor` trickery is not unacceptable by itself, but the added complexity is one disadvantage of the visitor approach.

I highly agree that that approach is highly unorthodox, and surprising. I have two observations for this.
1st, if we always had a ExplodedNode tag for taking an **eager** decision (and some similar tag when eager assumptions are disabled), then I wouldn't need that hack. You can see in my commit history there that this was my initial idea - but I realized that we may have good reasons for disabling eager assumptions, and in that case I wouldn't have those tags anymore :s
2nd, It's a convenient lie to have a stripped single-path "exploded graph" from a BugReport visitor, because then it's unambiguous what parent nodes it chooses, thus developers can focus on what matters the most. However, like in this case, it would be nice if we could do arbitrary things. If we had a better API doing this, it wouldn't feel odd anymore.
So there are ways to improve this.

> * The "enumerate all subexpressions of the loop condition and check the number of times they were assumed to be true/false" approach can handle more of my testcases than my original implementation, but it can produce very counter-intuitive results: [...]

Yes, I agree that this "try all subexpressions" (that I implemented) isn't bulletproof. However, I only did it in slightly more than a day. I'd say, if this is a concern, I can give it a bit of a polishing, and put it to the bench to see how it works in life.
I think if you have concerns about specifics, let's discuss that under that PR https://github.com/llvm/llvm-project/pull/111258, such that we all have the code in front of us. I think we can get fairly far once we can settle on some concrete examples where it would be surprising with the current implementation, and refine it from there.

> @steakhal @ anybody else Is my eager implementation acceptable for you (in theory, assuming that it's cleaned up sufficiently)? If yes, I'll start to work on implementing the suggestions of @isuckatcs and any other observations.

I don't think we reached consensus yet. I'm less concerned about the details, because I know you can do it.
This is why I didn't look into the details of this PR, or commented nits so far.

https://github.com/llvm/llvm-project/pull/109804