[PATCH] D138037: [analyzer] Remove unjustified assertion from EQClass::simplify

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 15 07:47:25 PST 2022


steakhal created this revision.
steakhal added reviewers: xazax.hun, NoQ, martong, ASDenysPetrov, vabridgers, Szelethus.
Herald added subscribers: manas, dkrupp, donat.nagy, mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware.
Herald added a project: All.
steakhal requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

One might think that by merging the equivalence classes (eqclasses) of `Sym1` and `Sym2` symbols we would end up with a `State` in which the eqclass of `Sym1` and `Sym2` should now be the same. Surprisingly, it's not //always// true.

Such an example triggered the assertion enforcing this //unjustified// invariant in https://github.com/llvm/llvm-project/issues/58677.

  unsigned a, b;
  #define assert(cond) if (!(cond)) return
  
  void f(unsigned c) {
      /*(1)*/ assert(c == b);
      /*(2)*/ assert((c | a) != a);
      /*(3)*/ assert(a);
      // a = 0  =>  c | 0 != 0  =>  c != 0  =>  b != 0
  }

I believe, that this assertion hold for reasonable cases - where both `MemberSym` and `SimplifiedMemberSym` refer to live symbols.
It can only fail if `SimplifiedMemberSym` refers to an already dead symbol. See the reasoning at the very end.
In this context, I don't know any way of determining if a symbol is alive/dead, so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.

---

Let me elaborate on why I think the assertion is wrong in its current shape.

Here is a quick reminder about equivalence classes in CSA.
We have 4 mappings:

1. `ClassMap`: map, associating `Symbols` with an `EquivalenceClass`.
2. `ClassMembers`: map, associating `EquivalenceClasses` with its members - basically an enumeration of the `Symbols` which are known to be equal.
3. `ConstraintRange`: map, associating `EquivalenceClasses` with the range constraint which should hold for all the members of the eqclass.
4. `DisequalityMap`: I'm omitting this, as it's irrelevant for our purposes now.

Whenever we encounter/assume that two `SymbolRefs` are equal, we update the `ClassMap` so that now both `SymbolRefs` are referring to the same eqclass. This operation is known as `merge` or `union`.
Each eqclass is uniquely identified by the `representative` symbol, but it could have been just a unique number or anything else - the point is that an eqclass is identified by something unique.
Initially, all Symbols form - by itself - a trivial eqclass, as there are no other Symbols to which it is assumed to be equal. A trivial eqclass has //notionally// exactly one member, the representative symbol.
I'm emphasizing that //notionally// because for such cases we don't store an entry in the `ClassMap` nor in the `ClassMembers` map, because it could be deduced.

By `merging` two eqclasses, we essentially do what you would think it does. An important thing to highlight is that the representative symbol of the resulting eqclass will be the same as one of the two eqclasses.
This operation should be commutative, so that `merge(eq1,eq2)` and `merge(eq2,eq1)` should result in the same eqclass - except for the representative symbol, which is just a unique identifier of the class, a name if you will. Using the representative symbol of `eq1` or `eq2` should have no visible effect on the analysis overall.

When merging `eq1` into `eq2`, we take each of the `ClassMembers` of `eq1` and add them to the `ClassMembers` of `eq2` while we also redirect all the `Symbol` members of `eq1` to map to the `eq2` eqclass in the `ClassMap`. This way all members of `eq1` will refer to the correct eqclass.
After these, `eq1` key is unreachable in the `ClassMembers`, hence we can drop it.

---

Let's get back to the example.
Note that when I refer to symbols `a`, `b`, `c`, I'm referring to the `SymbolRegionValue{VarRegion{.}}` - the value of that variable.

After `(1)`, we will have a constraint `c == b : [1,1]` and an eqclass `c,b` with the `c` representative symbol.
After `(2)`, we will have an additional constraint `c|b != a : [1,1]` with the same eqclass. We will also have disequality info about that `c|a` is disequal to `a` - and the other way around.
However, after the full-expression, `c` will become dead. This kicks in the garbage collection, which transforms the state into this:

- We no longer have any constraints, because only `a` is alive, for which we don't have any constraints.
- We have a single (non-trivial) eqclass with a single element `b` and representative symbol `c`. (Dead symbols can be representative symbols.)
- We have the same disequality info as before.

At `(3)` within the false branch, `a` get perfectly constrained to zero. This kicks in the simplification, so we try to substitute (simplify) the variable in each SymExpr-tree. In our case, it means that the `c|a != a : [1,1]` constraint gets re-evaluated as `c|0 != 0 : [1,1]`, which is `c != 0 : [1,1]`.
Under the hood, it means that we will call `merge(c|a, c)`. where `c` is the result of `simplifyToSVal(State, MemberSym).getAsSymbol()` inside `EquivalenceClass::simplify()`.
Note that the result of `simplifyToSVal()` was a dead symbol. We shouldn't have acquired an already dead symbol. AFAIK, this is the only way we can get one at this point.
Since `c` is dead, we no longer have a mapping in `ClassMap` for it; hence when we try to `find()` the eqclass of `c`, it will report that it's a trivial eqclass with the representative symbol `c`.

After `merge(c|a, c)`, we will have a single (non-trivial) eqclass of `b, c|a` with the representative symbol `c|a` - because we merged the eqclass of `c` into the eqclass of `c|a`.

This means that `find(c|a)` will result in the eqclass with the representative symbol `c|a`. So, we ended up having different eqclasses for `find(c|a)` and `find(c)` after `merge(c|a, c)`, firing the assertion.

I believe, that this assertion hold for reasonable cases - where both `MemberSym` and `SimplifiedMemberSym` refer to live symbols.
`MemberSym` should be live in all cases here, because it is from `ClassMembers` which is pruned in `removeDeadBindings()` so these must be alive. In this context, I don't know any way of determining if a symbol is alive/dead, so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.

I'd like to thank @martong for helping me to conclude the investigation.
It was really difficult to track down.

PS: I mentioned that `merge(eq1, eq2)` should be commutative. We actually exploit this for merging the smaller eqclass into the bigger one within `EquivalenceClass::merge()`.
Yea, for some reason, if you swap the operands, 3 tests break (only failures, no crashes) aside from the test which checks the state dump. But I believe, that is a different bug and orthogonal to this one. I just wanted to mention that.

- `Analysis/solver-sym-simplification-adjustment.c`
- `Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp`
- `Analysis/symbol-simplification-reassume.cpp`


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138037

Files:
  clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp


Index: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2616,7 +2616,18 @@
       if (OldState == State)
         continue;
 
-      assert(find(State, MemberSym) == find(State, SimplifiedMemberSym));
+      // Be aware that `SimplifiedMemberSym` might refer to an already dead
+      // symbol. In that case, the eqclass of that might not be the same as the
+      // eqclass of `MemberSym`. This is because the dead symbols are not
+      // preserved in the `ClassMap`, hence
+      // `find(State, SimplifiedMemberSym)` will result in a trivial eqclass
+      // compared to the eqclass of `MemberSym`.
+      // These eqclasses should be the same if `SimplifiedMemberSym` is alive.
+      // --> assert(find(State, MemberSym) == find(State, SimplifiedMemberSym))
+      //
+      // Note that `MemberSym` must be alive here since that is from the
+      // `ClassMembers` where all the symbols are alive.
+
       // Remove the old and more complex symbol.
       State = find(State, MemberSym).removeMember(State, MemberSym);
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D138037.475477.patch
Type: text/x-patch
Size: 1236 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20221115/523a403a/attachment.bin>


More information about the cfe-commits mailing list