[clang] [analyzer] Don't assume third iteration in loops (PR #119388)

Fri Dec 20 08:27:00 PST 2024

NagyDonat wrote:

> > [...] To fix this problem, it would be sufficient to e.g. ensure that evalEagerlyAssumeBifurcation sets LastEagerlyAssumeBifurcationAt to nullptr [...]
> 
> Sounds good to me. Let's zero it out after it's "used"/"consumed".

I took a different approach (which was already mostly implemented when I've seen this suggestion): I'm setting the stored `Expr *` to nullpointer when I encounter an `EagerlyAssume` which does not succeed (i.e. at most one branch is feasible). I also added a dedicated test file which shows that this change works (and demonstrates other features and limitations of the commit).

> > [...] there might be better solutions to implement this "did EagerlyAssume split the state right now directly before this processBranch callback?" check that I need. (E.g. perhaps we could walk a few steps backwards on the ExplodedGraph -- but I don't know what kinds of nodes should I expect and I'm afraid that what I could write would run into pitfalls in unusual cases.)
> 
> Yes, it could get messy. For instance if we have multiple parent nodes due to a merge. Let's not do traversals.

OK, if you say so, I'm very happy to avoid it :grinning: 

> I'm good with the change once the last remark is fixed. BTW have you measured the running time implications of this patch? How much we spare?

I don't expect running _time_ implications, because as we discard these execution paths, the analyzer will traverse other execution paths instead of them -- so the performance gains should appear as "more results" instead of "faster analysis". Unfortunately the improvements that I hoped didn't appear in the experimental results: the "final" version of this commit doesn't produce significantly more results compared to the "intermediate" version which just suppresses the results (after "wasting time" on traversing the unjustified paths).

I think we don't see visible performance improvements because my observation that "difference between 2 or 3 iterations is irrelevant on the later parts of the execution path" does not imply that we can save a significant amount of calculations by driving each of those paths through 2 iterations (instead of roughly half 2 iterations, half 3 iterations) -- because those paths differ in many assumptions, so we cannot merge them even if we ignore the iteration count in that particular loop.

Additionally, my measurements are on stable open source projects, where real results are much rarer than false positives (even on stable checkers), so the number of results that we get is an imperfect and somewhat noisy proxy of the actual improvements.

https://github.com/llvm/llvm-project/pull/119388