[llvm] [ScheduleDAG] Dirty height/depth in addPred/removePred even for latency zero (PR #102915)

Mon Aug 12 08:26:55 PDT 2024

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Björn Pettersson (bjope)

<details>
<summary>Changes</summary>

A long time ago (back in 2009) there was a commit 52d4d8244b7c39c5cd that changed the scheduler to not dirty height/depth when adding or removing SUnit predecessors when the latency on the edge was zero. That commit message is claiming that the depth or height isn't affected when the latency is zero.

As a matter of fact, the depth/height can change even with a zero latency on the edge. If for example adding a new SUnit A, with zero latency, but as a predecessor to a SUnit B, then both height of A and depth of B should be marked as dirty. If for example B has a greater height than A, then the height of A needs to be adjusted even if the latency is zero.

I think this has been wrong for many years. Downstream we have had commit 52d4d8244b7c39c5cd reverted since back in 2016. There is no motivating lit test for 52d4d8244b7c39c5cd (only an incomplete C level reproducer in https://github.com/llvm/llvm-project/issues/3613).

After commit 13d04fa560e156797c21f1 there finally appeared an upstream lit test that show sthat we get better code if marking height/depth as dirty (llvm/test/CodeGen/AArch64/abds.ll).

---
Full diff: https://github.com/llvm/llvm-project/pull/102915.diff


2 Files Affected:

- (modified) llvm/lib/CodeGen/ScheduleDAG.cpp (+7-8) 
- (modified) llvm/test/CodeGen/AArch64/abds.ll (+18-21) 


``````````diff

diff --git a/llvm/lib/CodeGen/ScheduleDAG.cpp b/llvm/lib/CodeGen/ScheduleDAG.cpp
index 8d9a5041fc2fea..26857edd871e2b 100644
--- a/llvm/lib/CodeGen/ScheduleDAG.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAG.cpp
@@ -125,6 +125,9 @@ bool SUnit::addPred(const SDep &D, bool Required) {
           }
         }
         PredDep.setLatency(D.getLatency());
+        // Changing latency, dirty the involved SUnits.
+        this->setDepthDirty();
+        D.getSUnit()->setHeightDirty();
       }
       return false;
     }
@@ -164,10 +167,8 @@ bool SUnit::addPred(const SDep &D, bool Required) {
   }
   Preds.push_back(D);
   N->Succs.push_back(P);
-  if (P.getLatency() != 0) {
-    this->setDepthDirty();
-    N->setHeightDirty();
-  }
+  this->setDepthDirty();
+  N->setHeightDirty();
   return true;
 }
 
@@ -209,10 +210,8 @@ void SUnit::removePred(const SDep &D) {
   }
   N->Succs.erase(Succ);
   Preds.erase(I);
-  if (P.getLatency() != 0) {
-    this->setDepthDirty();
-    N->setHeightDirty();
-  }
+  this->setDepthDirty();
+  N->setHeightDirty();
 }
 
 void SUnit::setDepthDirty() {
diff --git a/llvm/test/CodeGen/AArch64/abds.ll b/llvm/test/CodeGen/AArch64/abds.ll
index d861ee64b99514..85b70ede2807bb 100644
--- a/llvm/test/CodeGen/AArch64/abds.ll
+++ b/llvm/test/CodeGen/AArch64/abds.ll
@@ -180,14 +180,13 @@ define i64 @abd_ext_i64_undef(i64 %a, i64 %b) nounwind {
 define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
 ; CHECK-LABEL: abd_ext_i128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x2, x0
-; CHECK-NEXT:    sbc x8, x3, x1
-; CHECK-NEXT:    subs x9, x0, x2
-; CHECK-NEXT:    sbc x10, x1, x3
-; CHECK-NEXT:    subs x11, x2, x0
+; CHECK-NEXT:    subs x8, x0, x2
+; CHECK-NEXT:    sbc x9, x1, x3
+; CHECK-NEXT:    subs x10, x2, x0
+; CHECK-NEXT:    sbc x11, x3, x1
 ; CHECK-NEXT:    sbcs xzr, x3, x1
-; CHECK-NEXT:    csel x0, x9, x11, lt
-; CHECK-NEXT:    csel x1, x10, x8, lt
+; CHECK-NEXT:    csel x0, x8, x10, lt
+; CHECK-NEXT:    csel x1, x9, x11, lt
 ; CHECK-NEXT:    ret
   %aext = sext i128 %a to i256
   %bext = sext i128 %b to i256
@@ -200,14 +199,13 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
 define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
 ; CHECK-LABEL: abd_ext_i128_undef:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x2, x0
-; CHECK-NEXT:    sbc x8, x3, x1
-; CHECK-NEXT:    subs x9, x0, x2
-; CHECK-NEXT:    sbc x10, x1, x3
-; CHECK-NEXT:    subs x11, x2, x0
+; CHECK-NEXT:    subs x8, x0, x2
+; CHECK-NEXT:    sbc x9, x1, x3
+; CHECK-NEXT:    subs x10, x2, x0
+; CHECK-NEXT:    sbc x11, x3, x1
 ; CHECK-NEXT:    sbcs xzr, x3, x1
-; CHECK-NEXT:    csel x0, x9, x11, lt
-; CHECK-NEXT:    csel x1, x10, x8, lt
+; CHECK-NEXT:    csel x0, x8, x10, lt
+; CHECK-NEXT:    csel x1, x9, x11, lt
 ; CHECK-NEXT:    ret
   %aext = sext i128 %a to i256
   %bext = sext i128 %b to i256
@@ -278,14 +276,13 @@ define i64 @abd_minmax_i64(i64 %a, i64 %b) nounwind {
 define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
 ; CHECK-LABEL: abd_minmax_i128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x2, x0
-; CHECK-NEXT:    sbc x8, x3, x1
-; CHECK-NEXT:    subs x9, x0, x2
-; CHECK-NEXT:    sbc x10, x1, x3
-; CHECK-NEXT:    subs x11, x2, x0
+; CHECK-NEXT:    subs x8, x0, x2
+; CHECK-NEXT:    sbc x9, x1, x3
+; CHECK-NEXT:    subs x10, x2, x0
+; CHECK-NEXT:    sbc x11, x3, x1
 ; CHECK-NEXT:    sbcs xzr, x3, x1
-; CHECK-NEXT:    csel x0, x9, x11, lt
-; CHECK-NEXT:    csel x1, x10, x8, lt
+; CHECK-NEXT:    csel x0, x8, x10, lt
+; CHECK-NEXT:    csel x1, x9, x11, lt
 ; CHECK-NEXT:    ret
   %min = call i128 @llvm.smin.i128(i128 %a, i128 %b)
   %max = call i128 @llvm.smax.i128(i128 %a, i128 %b)

``````````

</details>


https://github.com/llvm/llvm-project/pull/102915