[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Fri Oct 18 03:46:17 PDT 2024

=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/109804 at github.com>

steakhal wrote:

Thank you for sharing your roadmap.
I didn't remember there was consensus on a direction of dropping execution paths in the related [RFC](https://discourse.llvm.org/t/loop-handling-improvement-plans/80417/13).
Nevertheless the direction seems interesting, but without consensus I'm not sure we should pursue that in upstream LLVM.

> (As I said earlier, I'm strongly opposed to a solution where the visitor-based suppression is activated by each checker [or many checkers] because then we'd be discarding a significant percentage (perhaps even the majority) of work done by the analyzer.)

My problem with this approach is still that it's intrusive. We can only cut those branches once all checkers agree on suppressing all the reports coming from those branches. And it's not only about upstream checkers. There could be downstream checkers, that for example could do much more with the ExplodedGraph than just reporting issues.

One particular (slightly broken) checker that crosses my mind in this case is the `UnreachableCodeChecker` which checks the `Eng.hasWorkRemaining()` in the `checkEndAnalysis` callback to see if we explored all possible branches in the top-level function we just analyzed. If so, it checks what BasicBlocks in the CFG were left unvisited to diagnose unreachable code. I don't imply that this particular checker is of high quality or used in production (AFAIK); all I'm saying is that these make me skeptical of not exploring these branches. This makes the argument of future-proofing weaker to me.

> Edit: the very recent bug report https://github.com/llvm/llvm-project/issues/112527 is another example where applying this heuristic to core.UndefinedBinaryOperatorResult would've probably prevented the false positive.

Another way of looking at that issue is, if we would have inlined the `std::array::size()` would would have known that the given branch is infeasible, like in [this](https://github.com/llvm/llvm-project/pull/109804) older case. In any case, I wouldn't draw far arching conclusions.

> I don't think that discarding the execution paths with these "weak" assumptions would be a really big change:
> * It only discards paths, so it cannot introduce any false positives (unlike the loop widening plans, which performs invalidation).
> * I don't think that there is a systemic class of bugs that are only found on these "weak assumption execution path" branches and can be distinguished algorithmically from the highly unwanted results.

I think different tool vendors may want to target different FP TP rates. This argument only holds if we assume it's fine to drop TPs in favor of droping FPs. To me, it always depends. As a general guideline, I agree with your sentiment though. With mass changes like this, I'm not sure anymore without empirical evidence.

> > [...] and with options for disabling it while developing and preparing this new version.
> 
> Obviously I can put the freshly developed stuff behind analyzer options if there's demand for it. In the case of this `ArrayBoundV2` change I did not do so, because `ArrayBoundV2` is an alpha checker and it's very noisy without this patch, but if I extend this to stable checkers I'll probably introduce an option.

The `ArrayBoundV2` changes make sense, exactly because it's an `alpha` checker like you said. I'm not challenging that.

> > > @steakhal @ anybody else Is my eager implementation acceptable for you (in theory, assuming that it's cleaned up sufficiently)? If yes, I'll start to work on implementing the suggestions of @isuckatcs and any other observations.
> > 
> > 
> > I don't think we reached consensus yet.
> 
> I see. What can I do to approach consensus? Is there some aspect of either of the implementations that we should discuss in more detail? Or should we ask other contributors?

I had a slight discomfort of implementing something in the Engine that could be done in higher-level, like in a BR visitor.
What you said about dropping execution paths makes sense and would potentially bring a huge improvement in multiple aspects such as TP FP rate and even analysis time. Making research and experimentation tackling these issues is really important. The perfect is the enemy of good; so I'm fine now with doing this in the Engine.

Although, I still have major discomfort of extending this beyond OOBv2, and skipping analyzing execution paths we do today - like you shared in your plans. To convince me, I'd need evidence of the net benefit without any major blocking issues (in other words loosing truly valuable TPs for example, or if it would turn out that some internal components couldn't be upgraded to the new loop modeling model). This is a lot more difficult, and may not even be practical to do incrementally.
The llvm policies suggest that feature-branches are discouraged; but they don't formulate opinions on forks.
Maybe you would have more flexibility for reaching a stable version in a fork than directly developing this upstream.
This way once the prototype is complete, or reaches a maturity where we could gather data for supporting the claims that this is the way how loops should be handled.
Once this is formulated, the maintainers could discuss and see if they agree with these claims and then accepting the patches.

I think the more experienced Static Analyzer devs participate in a discussion, the better.
I'd definitely expect the code owners to express their opinion, and reach unanimous consensus before accepting patches for changing how loops are modeled in general.

Given the title, summary and the content of this PR, I'm convinced that this isn't the right forum though.
This should be done in the LLVM discourse.

https://github.com/llvm/llvm-project/pull/109804