[llvm] True fixpoint algorithm in RS4GC (PR #75826)

Petr Maj via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 18 09:06:24 PST 2023


https://github.com/zduka created https://github.com/llvm/llvm-project/pull/75826

Fixes a problem where the explicit marking of various instructions as conflicts did not propagate to their users. An example of this:

```
%getelementptr = getelementptr i8, <2 x ptr addrspace(1)> zeroinitializer, <2 x i64> <i64 888, i64 908>
%shufflevector = shufflevector <2 x ptr addrspace(1)> %getelementptr, <2 x ptr addrspace(1)> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shufflevector1 = shufflevector <2 x ptr addrspace(1)> %getelementptr, <2 x ptr addrspace(1)> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%select = select i1 false, <4 x ptr addrspace(1)> %shufflevector1, <4 x ptr addrspace(1)> %shufflevector
```

Here the vector shuffles will get single base (gep) during the fixpoint and therefore the select will get a known base (gep). We later mark the shuffles as conflicts, but this does not change the base of select. This gets caught by an assert where the select's type will differ from its (wrong) base later on.

The solution in the MR is to move the explicit conflict marking into the fixpoint phase.


>From 6a4c55d75ac2c2b35b41549c22b98ab6a3ef4e91 Mon Sep 17 00:00:00 2001
From: Petr Maj <pmaj at azul.com>
Date: Mon, 18 Dec 2023 17:47:45 +0100
Subject: [PATCH 1/2] True fixpoint for RS4GC

---
 .../Scalar/RewriteStatepointsForGC.cpp        | 94 ++++++++++---------
 1 file changed, 48 insertions(+), 46 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
index 40b4ea92e1ff90..af49074d3832da 100644
--- a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
+++ b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
@@ -967,6 +967,44 @@ static Value *findBasePointer(Value *I, DefiningValueMapTy &Cache,
     return BDVState(BaseValue, BDVState::Base, BaseValue);
   };
 
+  // Even though we have identified a concrete base (or a conflict) for all live
+  // pointers at this point, there are cases where the base is of an
+  // incompatible type compared to the original instruction. We conservatively
+  // mark those as conflicts to ensure that corresponding BDVs will be generated
+  // in the next steps.
+
+  // this is a rather explicit check for all cases where we should mark the
+  // state as a conflict to force the latter stages of the algorithm to emit
+  // the BDVs.
+  // TODO: in many cases the instructions emited for the conflicting states
+  // will be identical to the I itself (if the I's operate on their BDVs
+  // themselves). We should exploit this, but can't do it here since it would
+  // break the invariant about the BDVs not being known to be a base.
+  // TODO: the code also does not handle constants at all - the algorithm relies
+  // on all constants having the same BDV and therefore constant-only insns
+  // will never be in conflict, but this check is ignored here. If the
+  // constant conflicts will be to BDVs themselves, they will be identical
+  // instructions and will get optimized away (as in the above TODO)
+  auto MarkConflict = [&](Instruction *I, Value *BaseValue) {
+    // II and EE mixes vector & scalar so is always a conflict
+    if (isa<InsertElementInst>(I) || isa<ExtractElementInst>(I))
+      return true;
+    // Shuffle vector is always a conflict as it creates new vector from
+    // existing ones.
+    if (isa<ShuffleVectorInst>(I))
+      return true;
+    // Any  instructions where the computed base type differs from the
+    // instruction type. An example is where an extract instruction is used by a
+    // select. Here the select's BDV is a vector (because of extract's BDV),
+    // while the select itself is a scalar type. Note that the IE and EE
+    // instruction check is not fully subsumed by the vector<->scalar check at
+    // the end, this is due to the BDV algorithm being ignorant of BDV types at
+    // this junction.
+    if (!areBothVectorOrScalar(BaseValue, I))
+      return true;
+    return false;
+  };
+
   bool Progress = true;
   while (Progress) {
 #ifndef NDEBUG
@@ -993,6 +1031,14 @@ static Value *findBasePointer(Value *I, DefiningValueMapTy &Cache,
         NewState.meet(OpState);
       });
 
+      // if the instruction has known base, but should in fact be marked as 
+      // conflict because of incompatible in/out types, we mark it as such 
+      // ensuring that it will propagate through the fixpoint iteration
+      auto I = cast<Instruction>(BDV);
+      auto BV = NewState.getBaseValue();
+      if (BV && MarkConflict(I, BV))
+          NewState = BDVState(I, BDVState::Conflict);
+
       BDVState OldState = Pair.second;
       if (OldState != NewState) {
         Progress = true;
@@ -1012,44 +1058,8 @@ static Value *findBasePointer(Value *I, DefiningValueMapTy &Cache,
   }
 #endif
 
-  // Even though we have identified a concrete base (or a conflict) for all live
-  // pointers at this point, there are cases where the base is of an
-  // incompatible type compared to the original instruction. We conservatively
-  // mark those as conflicts to ensure that corresponding BDVs will be generated
-  // in the next steps.
-
-  // this is a rather explicit check for all cases where we should mark the
-  // state as a conflict to force the latter stages of the algorithm to emit
-  // the BDVs.
-  // TODO: in many cases the instructions emited for the conflicting states
-  // will be identical to the I itself (if the I's operate on their BDVs
-  // themselves). We should expoit this, but can't do it here since it would
-  // break the invariant about the BDVs not being known to be a base.
-  // TODO: the code also does not handle constants at all - the algorithm relies
-  // on all constants having the same BDV and therefore constant-only insns
-  // will never be in conflict, but this check is ignored here. If the
-  // constant conflicts will be to BDVs themselves, they will be identical
-  // instructions and will get optimized away (as in the above TODO)
-  auto MarkConflict = [&](Instruction *I, Value *BaseValue) {
-    // II and EE mixes vector & scalar so is always a conflict
-    if (isa<InsertElementInst>(I) || isa<ExtractElementInst>(I))
-      return true;
-    // Shuffle vector is always a conflict as it creates new vector from
-    // existing ones.
-    if (isa<ShuffleVectorInst>(I))
-      return true;
-    // Any  instructions where the computed base type differs from the
-    // instruction type. An example is where an extract instruction is used by a
-    // select. Here the select's BDV is a vector (because of extract's BDV),
-    // while the select itself is a scalar type. Note that the IE and EE
-    // instruction check is not fully subsumed by the vector<->scalar check at
-    // the end, this is due to the BDV algorithm being ignorant of BDV types at
-    // this junction.
-    if (!areBothVectorOrScalar(BaseValue, I))
-      return true;
-    return false;
-  };
-
+  // since we do the conflict marking as part of the fixpoint iteration this loop
+  // only asserts that invariants are met
   for (auto Pair : States) {
     Instruction *I = cast<Instruction>(Pair.first);
     BDVState State = Pair.second;
@@ -1061,14 +1071,6 @@ static Value *findBasePointer(Value *I, DefiningValueMapTy &Cache,
         (!isKnownBase(I, KnownBases) || !areBothVectorOrScalar(I, BaseValue)) &&
         "why did it get added?");
     assert(!State.isUnknown() && "Optimistic algorithm didn't complete!");
-
-    // since we only mark vec-scalar insns as conflicts in the pass, our work is
-    // done if the instruction already conflicts
-    if (State.isConflict())
-      continue;
-
-    if (MarkConflict(I, BaseValue))
-      States[I] = BDVState(I, BDVState::Conflict);
   }
 
 #ifndef NDEBUG

>From eebcff41703e601e08b0615f3cc7ec7d876b4992 Mon Sep 17 00:00:00 2001
From: Petr Maj <pmaj at azul.com>
Date: Mon, 18 Dec 2023 17:49:33 +0100
Subject: [PATCH 2/2] format

---
 llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
index af49074d3832da..76660bf88fe277 100644
--- a/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
+++ b/llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
@@ -1031,13 +1031,13 @@ static Value *findBasePointer(Value *I, DefiningValueMapTy &Cache,
         NewState.meet(OpState);
       });
 
-      // if the instruction has known base, but should in fact be marked as 
-      // conflict because of incompatible in/out types, we mark it as such 
+      // if the instruction has known base, but should in fact be marked as
+      // conflict because of incompatible in/out types, we mark it as such
       // ensuring that it will propagate through the fixpoint iteration
       auto I = cast<Instruction>(BDV);
       auto BV = NewState.getBaseValue();
       if (BV && MarkConflict(I, BV))
-          NewState = BDVState(I, BDVState::Conflict);
+        NewState = BDVState(I, BDVState::Conflict);
 
       BDVState OldState = Pair.second;
       if (OldState != NewState) {
@@ -1058,8 +1058,8 @@ static Value *findBasePointer(Value *I, DefiningValueMapTy &Cache,
   }
 #endif
 
-  // since we do the conflict marking as part of the fixpoint iteration this loop
-  // only asserts that invariants are met
+  // since we do the conflict marking as part of the fixpoint iteration this
+  // loop only asserts that invariants are met
   for (auto Pair : States) {
     Instruction *I = cast<Instruction>(Pair.first);
     BDVState State = Pair.second;



More information about the llvm-commits mailing list