[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

Mon Nov 6 03:14:39 PST 2023

https://github.com/DonatNagyE commented:

This is a good and important improvement of the analysis results, so I support merging it (with some very minor changes), but I feel that it's a "practical, but incomplete" band-aid instead of a systemic improvement that fits into the architecture.

The most straightforward issue that I see is that (if I understand the code correctly) the intersected constraint (the value of the variable `Constraint` at the end of `handleEquivalentAlternativeSymOperands()`) is just discarded after checking that it's not empty. It's plausible that this discarded information will be re-calculated by this process in the follow-up range calculations and don't have a concrete testcase where it's relevant that we "leave food on the table" (I can try to find one if you'd find it helpful) but I think that it may be useful to think about "preserving" this `Constraint` that we calculated. (This is not a blocking issue: I can support merging the current logic if you don't see a reasonably simple improvement in this area.)

There is also the choice that you limited the runtime to `|eqclass(X)| + |eqclass(Y)|` by only checking the alternative constraints where one of the two representatives is the original one. I think that here it's a good decision to ensure a reasonable runtime by potentially discarding some information, but I feel that it'd be good to document this decision with a testcase. For example, I think
```
void gh_62215_left_and_right(int x, int y, int z, int w) {
  if (x != y) return; // x == y
  if (z != w) return; // z == w
  if (z <= x) return; // z > x
  if (w >= y) return; // w < y
}
```
would lead to a state that's unreachable, but this fact is not recognized by the current improvement.

Finally there is the "why do we need this trickery at all?" question that was also raised by @Xazax-hun. I agree with him that on a longer term, it'd be good to "strengthen" the equality classes and ensure that we assign the same RangeSet to each member of an EQClass (or, in other words, we assign the RangeSet to an EQClass instead of an individual symbol). This would potentially improve the runtime (although I'm not sure that performance is especially relevant here) and handle the left-and-right testcase adequately.

However, I don't know how difficult would it be to maintain the invariant that the same RangeSet is assigned to each symbol in an EqClass (we'd need to intersect and update range sets whenever an equality is deduced, handle the situations where this reveals that equality is infeasible and handle any problems arising from the case when the representative becomes dead) -- so I can support merging this working code now instead of waiting indefinitely for that more through solution. (Perhaps consider adding a FIXME at the beginning of `handleEquivalentAlternativeSymOperands()` which mentions this longer-term alternative.)

https://github.com/llvm/llvm-project/pull/71284