[PATCH] D124111: [AMDGPU]: Fix failing assertion in SIMachineScheduler

Jannik Silvanus via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 20 11:36:05 PDT 2022


jsilvanus created this revision.
Herald added subscribers: hsmhsm, foad, kerbowa, javed.absar, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl, arsenm, MatzeB.
Herald added a project: All.
jsilvanus requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

This fixes the assertion failure "Loop in the Block Graph!".

SIMachineScheduler groups instructions into blocks (also referred to
as coloring or groups) and then performs a two-level scheduling:
inter-block scheduling, and intra-block scheduling.

This approach requires that the dependency graph on the blocks which
is obtained by contracting the blocks in the original dependency graph
is acyclic. In other words: Whenever A and B end up in the same block,
all vertices on a path from A to B must be in the same block.

When compiling an example consisting of an export followed by
a buffer store, we see a dependency between these two. This dependency
may be false, but that is a different issue.
This dependency was not correctly accounted for by SiMachineScheduler.

A new test case si-scheduler-exports.ll demonstrating this is
also added in this commit.

The problematic part of SiMachineScheduler was a post-optimization of
the block assignment that tried to group all export instructions into
a separate export block for better execution performance. This routine
correctly checked that any paths from exports to exports did not
contain any non-exports, but not vice-versa: In case of an export with
a non-export successor dependency, that single export was moved
to a separate block, which could then be both a successor and a
predecessor block of a non-export block.

As fix, we now skip export grouping if there are exports with direct
non-export successor dependencies. This fixes the issue at hand,
but is slightly pessimistic:
We *could* group all exports into a separate block that have neither
direct nor indirect export successor dependencies.
We will review the potential performance impact and potentially
revisit with a more sophisticated implementation.

Note that just grouping all exports without direct non-export successor
dependencies could still lead to illegal blocks, since non-export A
could depend on export B that depends on export C. In that case,
export C has no non-export successor, but still may not be grouped
into an export block.

Change-Id: I18ab846940a2a3e224199e72339e6fd181b7e8de


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D124111

Files:
  llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
  llvm/test/CodeGen/AMDGPU/si-scheduler-exports.ll


Index: llvm/test/CodeGen/AMDGPU/si-scheduler-exports.ll
===================================================================
--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/si-scheduler-exports.ll
@@ -0,0 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -march=amdgcn -mcpu=gfx1010 --misched=si -mattr=si-scheduler < %s | FileCheck %s
+
+define amdgpu_gs void @_amdgpu_gs_main() #0 {
+; CHECK-LABEL: _amdgpu_gs_main:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    s_mov_b32 s0, 0
+; CHECK-NEXT:    s_mov_b32 s1, s0
+; CHECK-NEXT:    s_mov_b32 s2, s0
+; CHECK-NEXT:    v_mov_b32_e32 v1, v0
+; CHECK-NEXT:    v_mov_b32_e32 v2, v0
+; CHECK-NEXT:    v_mov_b32_e32 v3, v0
+; CHECK-NEXT:    s_mov_b32 s3, s0
+; CHECK-NEXT:    exp mrt0 off, off, off, off
+; CHECK-NEXT:    buffer_store_dwordx4 v[0:3], v0, s[0:3], 0 idxen
+; CHECK-NEXT:    s_endpgm
+entry:
+  call void @llvm.amdgcn.exp.f32(i32 0, i32 0, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, i1 false, i1 false)
+  call void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float> zeroinitializer, <4 x i32> zeroinitializer, i32 0, i32 0, i32 0, i32 0)
+  ret void
+}
+
+declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg)
+declare void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32 immarg)
+
+attributes #0 = { "target-features"=",+si-scheduler" }
Index: llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
+++ llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
@@ -1123,36 +1123,26 @@
   for (unsigned SUNum : DAG->TopDownIndex2SU) {
     const SUnit &SU = DAG->SUnits[SUNum];
     if (SIInstrInfo::isEXP(*SU.getInstr())) {
-      // Check the EXP can be added to the group safely,
-      // ie without needing any other instruction.
-      // The EXP is allowed to depend on other EXP
-      // (they will be in the same group).
-      for (unsigned j : ExpGroup) {
-        bool HasSubGraph;
-        std::vector<int> SubGraph;
-        // By construction (topological order), if SU and
-        // DAG->SUnits[j] are linked, DAG->SUnits[j] is necessary
-        // in the parent graph of SU.
-#ifndef NDEBUG
-        SubGraph = DAG->GetTopo()->GetSubGraph(SU, DAG->SUnits[j],
-                                               HasSubGraph);
-        assert(!HasSubGraph);
-#endif
-        SubGraph = DAG->GetTopo()->GetSubGraph(DAG->SUnits[j], SU,
-                                               HasSubGraph);
-        if (!HasSubGraph)
-          continue; // No dependencies between each other
-
-        // SubGraph contains all the instructions required
-        // between EXP SUnits[j] and EXP SU.
-        for (unsigned k : SubGraph) {
-          if (!SIInstrInfo::isEXP(*DAG->SUnits[k].getInstr()))
-            // Other instructions than EXP would be required in the group.
-            // Abort the grouping.
-            return;
+      // SU is an export instruction. Check whether one of its successor
+      // dependencies is a non-export, in which case we skip export grouping.
+      for (SDep const &SuccDep : SU.Succs) {
+        SUnit const *SuccSU = SuccDep.getSUnit();
+        if (SuccDep.isWeak() || SuccSU->NodeNum >= DAG->SUnits.size()) {
+          // Ignore these dependencies.
+          continue;
+        }
+        assert(SuccSU->isInstr() &&
+               "SUnit unexpectedly not representing an instruction!");
+
+        if (!SIInstrInfo::isEXP(*SuccSU->getInstr())) {
+          // A non-export depends on us. Skip export grouping.
+          // Note that this is a bit pessimistic: We could still group all other
+          // exports that are not depended on by non-exports, directly or
+          // indirectly. Simply skipping this particular export but grouping all
+          // others would not account for indirect dependencies.
+          return;
         }
       }
-
       ExpGroup.push_back(SUNum);
     }
   }


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D124111.423981.patch
Type: text/x-patch
Size: 4131 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220420/86caece8/attachment.bin>


More information about the llvm-commits mailing list