[llvm] [ScheduleDAG] Dirty height/depth in addPred/removePred even for latency zero (PR #102915)

Mon Aug 12 08:26:23 PDT 2024

https://github.com/bjope created https://github.com/llvm/llvm-project/pull/102915

A long time ago (back in 2009) there was a commit 52d4d8244b7c39c5cd that changed the scheduler to not dirty height/depth when adding or removing SUnit predecessors when the latency on the edge was zero. That commit message is claiming that the depth or height isn't affected when the latency is zero.

As a matter of fact, the depth/height can change even with a zero latency on the edge. If for example adding a new SUnit A, with zero latency, but as a predecessor to a SUnit B, then both height of A and depth of B should be marked as dirty. If for example B has a greater height than A, then the height of A needs to be adjusted even if the latency is zero.

I think this has been wrong for many years. Downstream we have had commit 52d4d8244b7c39c5cd reverted since back in 2016. There is no motivating lit test for 52d4d8244b7c39c5cd (only an incomplete C level reproducer in https://github.com/llvm/llvm-project/issues/3613).

After commit 13d04fa560e156797c21f1 there finally appeared an upstream lit test that show sthat we get better code if marking height/depth as dirty (llvm/test/CodeGen/AArch64/abds.ll).

>From 5bebefd5899b16217693eeb5031efc54058a740b Mon Sep 17 00:00:00 2001
From: Bjorn Pettersson <bjorn.a.pettersson at ericsson.com>
Date: Mon, 12 Aug 2024 16:55:59 +0200
Subject: [PATCH] [ScheduleDAG] Dirty height/depth in addPred/removePred even
 for latency 0

A long time ago (back in 2009) there was a commit 52d4d8244b7c39c5cd
that changed the scheduler to not dirty height/depth when adding or
removing SUnit predecessors when the latency on the edge was zero.
That commit message is claiming that the depth or height isn't
affected when the latency is zero.

As a matter of fact, the depth/height can change even with a zero
latency on the edge. If for example adding a new SUnit A, with zero
latency, but as a predecessor to a SUnit B, then both height of A
and depth of B should be marked as dirty. If for example B has a
greater height than A, then the height of A needs to be adjusted
even if the latency is zero.

I think this has been wrong for many years. Downstream we have had
commit 52d4d8244b7c39c5cd reverted since back in 2016. There is no
motivating lit test for 52d4d8244b7c39c5cd (only an incomplete C
level reproducer in https://github.com/llvm/llvm-project/issues/3613).

After commit 13d04fa560e156797c21f1 there finally appeared an
upstream lit test that show sthat we get better code if marking
height/depth as dirty (llvm/test/CodeGen/AArch64/abds.ll).
---
 llvm/lib/CodeGen/ScheduleDAG.cpp  | 15 ++++++------
 llvm/test/CodeGen/AArch64/abds.ll | 39 ++++++++++++++-----------------
 2 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/llvm/lib/CodeGen/ScheduleDAG.cpp b/llvm/lib/CodeGen/ScheduleDAG.cpp
index 8d9a5041fc2fea..26857edd871e2b 100644
--- a/llvm/lib/CodeGen/ScheduleDAG.cpp
+++ b/llvm/lib/CodeGen/ScheduleDAG.cpp
@@ -125,6 +125,9 @@ bool SUnit::addPred(const SDep &D, bool Required) {
           }
         }
         PredDep.setLatency(D.getLatency());
+        // Changing latency, dirty the involved SUnits.
+        this->setDepthDirty();
+        D.getSUnit()->setHeightDirty();
       }
       return false;
     }
@@ -164,10 +167,8 @@ bool SUnit::addPred(const SDep &D, bool Required) {
   }
   Preds.push_back(D);
   N->Succs.push_back(P);
-  if (P.getLatency() != 0) {
-    this->setDepthDirty();
-    N->setHeightDirty();
-  }
+  this->setDepthDirty();
+  N->setHeightDirty();
   return true;
 }
 
@@ -209,10 +210,8 @@ void SUnit::removePred(const SDep &D) {
   }
   N->Succs.erase(Succ);
   Preds.erase(I);
-  if (P.getLatency() != 0) {
-    this->setDepthDirty();
-    N->setHeightDirty();
-  }
+  this->setDepthDirty();
+  N->setHeightDirty();
 }
 
 void SUnit::setDepthDirty() {
diff --git a/llvm/test/CodeGen/AArch64/abds.ll b/llvm/test/CodeGen/AArch64/abds.ll
index d861ee64b99514..85b70ede2807bb 100644
--- a/llvm/test/CodeGen/AArch64/abds.ll
+++ b/llvm/test/CodeGen/AArch64/abds.ll
@@ -180,14 +180,13 @@ define i64 @abd_ext_i64_undef(i64 %a, i64 %b) nounwind {
 define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
 ; CHECK-LABEL: abd_ext_i128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x2, x0
-; CHECK-NEXT:    sbc x8, x3, x1
-; CHECK-NEXT:    subs x9, x0, x2
-; CHECK-NEXT:    sbc x10, x1, x3
-; CHECK-NEXT:    subs x11, x2, x0
+; CHECK-NEXT:    subs x8, x0, x2
+; CHECK-NEXT:    sbc x9, x1, x3
+; CHECK-NEXT:    subs x10, x2, x0
+; CHECK-NEXT:    sbc x11, x3, x1
 ; CHECK-NEXT:    sbcs xzr, x3, x1
-; CHECK-NEXT:    csel x0, x9, x11, lt
-; CHECK-NEXT:    csel x1, x10, x8, lt
+; CHECK-NEXT:    csel x0, x8, x10, lt
+; CHECK-NEXT:    csel x1, x9, x11, lt
 ; CHECK-NEXT:    ret
   %aext = sext i128 %a to i256
   %bext = sext i128 %b to i256
@@ -200,14 +199,13 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
 define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
 ; CHECK-LABEL: abd_ext_i128_undef:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x2, x0
-; CHECK-NEXT:    sbc x8, x3, x1
-; CHECK-NEXT:    subs x9, x0, x2
-; CHECK-NEXT:    sbc x10, x1, x3
-; CHECK-NEXT:    subs x11, x2, x0
+; CHECK-NEXT:    subs x8, x0, x2
+; CHECK-NEXT:    sbc x9, x1, x3
+; CHECK-NEXT:    subs x10, x2, x0
+; CHECK-NEXT:    sbc x11, x3, x1
 ; CHECK-NEXT:    sbcs xzr, x3, x1
-; CHECK-NEXT:    csel x0, x9, x11, lt
-; CHECK-NEXT:    csel x1, x10, x8, lt
+; CHECK-NEXT:    csel x0, x8, x10, lt
+; CHECK-NEXT:    csel x1, x9, x11, lt
 ; CHECK-NEXT:    ret
   %aext = sext i128 %a to i256
   %bext = sext i128 %b to i256
@@ -278,14 +276,13 @@ define i64 @abd_minmax_i64(i64 %a, i64 %b) nounwind {
 define i128 @abd_minmax_i128(i128 %a, i128 %b) nounwind {
 ; CHECK-LABEL: abd_minmax_i128:
 ; CHECK:       // %bb.0:
-; CHECK-NEXT:    cmp x2, x0
-; CHECK-NEXT:    sbc x8, x3, x1
-; CHECK-NEXT:    subs x9, x0, x2
-; CHECK-NEXT:    sbc x10, x1, x3
-; CHECK-NEXT:    subs x11, x2, x0
+; CHECK-NEXT:    subs x8, x0, x2
+; CHECK-NEXT:    sbc x9, x1, x3
+; CHECK-NEXT:    subs x10, x2, x0
+; CHECK-NEXT:    sbc x11, x3, x1
 ; CHECK-NEXT:    sbcs xzr, x3, x1
-; CHECK-NEXT:    csel x0, x9, x11, lt
-; CHECK-NEXT:    csel x1, x10, x8, lt
+; CHECK-NEXT:    csel x0, x8, x10, lt
+; CHECK-NEXT:    csel x1, x9, x11, lt
 ; CHECK-NEXT:    ret
   %min = call i128 @llvm.smin.i128(i128 %a, i128 %b)
   %max = call i128 @llvm.smax.i128(i128 %a, i128 %b)