[llvm] 50b5b36 - [AMDGPU] Iterate LoweredEndCf in the reverse order

Christudasan Devadasan via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 5 21:27:53 PST 2022


Author: Christudasan Devadasan
Date: 2022-01-06T00:27:11-05:00
New Revision: 50b5b367c1ae72be5265f81b4dba03b3deb0c4e4

URL: https://github.com/llvm/llvm-project/commit/50b5b367c1ae72be5265f81b4dba03b3deb0c4e4
DIFF: https://github.com/llvm/llvm-project/commit/50b5b367c1ae72be5265f81b4dba03b3deb0c4e4.diff

LOG: [AMDGPU] Iterate LoweredEndCf in the reverse order

The function that optimally inserts the exec mask
restore operations by combining the blocks currently
visits the lowered END_CF pseudos in the forward
direction as it iterates the setvector in the order
the entries are inserted in it.

Due to the absence of BranchFolding at -O0, the
irregularly placed BBs cause the forward traversal
to incorrectly place two unconditional branches in
certain BBs while combining them, especially when
an intervening block later gets optimized away in
subsequent iterations.

It is avoided by reverse iterating the setvector.
The blocks at the bottom of a function will get
optimized first before processing those at the top.

Fixes: SWDEV-315215

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D116273

Added: 
    

Modified: 
    llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
    llvm/test/CodeGen/AMDGPU/collapse-endcf.mir

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
index 3168bcd53edac..6ec37b32d0a68 100644
--- a/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
@@ -582,7 +582,7 @@ void SILowerControlFlow::optimizeEndCf() {
   if (!RemoveRedundantEndcf)
     return;
 
-  for (MachineInstr *MI : LoweredEndCf) {
+  for (MachineInstr *MI : reverse(LoweredEndCf)) {
     MachineBasicBlock &MBB = *MI->getParent();
     auto Next =
       skipIgnoreExecInstsTrivialSucc(MBB, std::next(MI->getIterator()));

diff  --git a/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir b/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
index fc1ce0064afb2..a8b97c7932580 100644
--- a/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
+++ b/llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
@@ -805,5 +805,222 @@ body:             |
 
   bb.6:
     S_BRANCH %bb.4
+...
+
+---
+# While collapsing inner endcf, certain blocks ended up getting two S_BRANCH instructions.
+# It happens in the absence of BranchFolding (mostly at -O0) when the irregularly placed BBs are traversed
+# in the forward direction and the intervening block between a predecessor and its successor gets optimized
+# away in subsequent iterations, leaving 2 S_BRANCH instructions in the predecessor block.
+# The issue was fixed by iterating the blocks from bottom-up to ensure all endcf pseudos at the bottom of the
+# function are processed first.
+# This test ensures there are no multiple S_BRANCH instructions inserted in any block.
+
+name: no_multiple_unconditional_branches
+tracksRegLiveness: true
+body: |
+  ; GCN-LABEL: name: no_multiple_unconditional_branches
+  ; GCN: bb.0:
+  ; GCN-NEXT:   successors: %bb.1(0x40000000), %bb.14(0x40000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[V_CMP_EQ_U32_e64_:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 0, killed [[DEF]], implicit $exec
+  ; GCN-NEXT:   [[COPY:%[0-9]+]]:sreg_64 = COPY $exec, implicit-def $exec
+  ; GCN-NEXT:   [[S_AND_B64_:%[0-9]+]]:sreg_64 = S_AND_B64 [[COPY]], [[V_CMP_EQ_U32_e64_]], implicit-def dead $scc
+  ; GCN-NEXT:   $exec = S_MOV_B64_term killed [[S_AND_B64_]]
+  ; GCN-NEXT:   S_CBRANCH_EXECZ %bb.14, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.1
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.1:
+  ; GCN-NEXT:   successors: %bb.2(0x40000000), %bb.14(0x40000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[DEF1:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[V_CMP_EQ_U32_e64_1:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 0, killed [[DEF1]], implicit $exec
+  ; GCN-NEXT:   [[COPY1:%[0-9]+]]:sreg_64 = COPY $exec, implicit-def $exec
+  ; GCN-NEXT:   [[S_AND_B64_1:%[0-9]+]]:sreg_64 = S_AND_B64 [[COPY1]], killed [[V_CMP_EQ_U32_e64_1]], implicit-def dead $scc
+  ; GCN-NEXT:   $exec = S_MOV_B64_term killed [[S_AND_B64_1]]
+  ; GCN-NEXT:   S_CBRANCH_EXECZ %bb.14, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.2
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.2:
+  ; GCN-NEXT:   successors: %bb.3(0x40000000), %bb.7(0x40000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[DEF2:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[V_CMP_EQ_U32_e64_2:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 0, killed [[DEF2]], implicit $exec
+  ; GCN-NEXT:   [[COPY2:%[0-9]+]]:sreg_64 = COPY $exec, implicit-def $exec
+  ; GCN-NEXT:   [[S_AND_B64_2:%[0-9]+]]:sreg_64 = S_AND_B64 [[COPY2]], killed [[V_CMP_EQ_U32_e64_2]], implicit-def dead $scc
+  ; GCN-NEXT:   $exec = S_MOV_B64_term killed [[S_AND_B64_2]]
+  ; GCN-NEXT:   S_CBRANCH_EXECZ %bb.7, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.3
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.3:
+  ; GCN-NEXT:   successors: %bb.4(0x40000000), %bb.7(0x40000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[DEF3:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[V_CMP_EQ_U32_e64_3:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 0, killed [[DEF3]], implicit $exec
+  ; GCN-NEXT:   [[COPY3:%[0-9]+]]:sreg_64 = COPY $exec, implicit-def $exec
+  ; GCN-NEXT:   [[S_AND_B64_3:%[0-9]+]]:sreg_64 = S_AND_B64 [[COPY3]], killed [[V_CMP_EQ_U32_e64_3]], implicit-def dead $scc
+  ; GCN-NEXT:   $exec = S_MOV_B64_term killed [[S_AND_B64_3]]
+  ; GCN-NEXT:   S_CBRANCH_EXECZ %bb.7, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.4
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.4:
+  ; GCN-NEXT:   successors: %bb.7(0x80000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   S_BRANCH %bb.7
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.7:
+  ; GCN-NEXT:   successors: %bb.8(0x80000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   $exec = S_OR_B64 $exec, [[COPY2]], implicit-def $scc
+  ; GCN-NEXT:   S_BRANCH %bb.8
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.8:
+  ; GCN-NEXT:   successors: %bb.9(0x80000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   S_BRANCH %bb.9
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.9:
+  ; GCN-NEXT:   successors: %bb.11(0x40000000), %bb.12(0x40000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[DEF4:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[V_CMP_EQ_U32_e64_4:%[0-9]+]]:sreg_64 = V_CMP_EQ_U32_e64 0, killed [[DEF4]], implicit $exec
+  ; GCN-NEXT:   [[COPY4:%[0-9]+]]:sreg_64 = COPY $exec, implicit-def $exec
+  ; GCN-NEXT:   [[S_AND_B64_4:%[0-9]+]]:sreg_64 = S_AND_B64 [[COPY4]], killed [[V_CMP_EQ_U32_e64_4]], implicit-def dead $scc
+  ; GCN-NEXT:   [[S_XOR_B64_:%[0-9]+]]:sreg_64 = S_XOR_B64 [[S_AND_B64_4]], [[COPY4]], implicit-def dead $scc
+  ; GCN-NEXT:   $exec = S_MOV_B64_term killed [[S_AND_B64_4]]
+  ; GCN-NEXT:   S_CBRANCH_EXECZ %bb.12, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.11
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.10:
+  ; GCN-NEXT:   successors: %bb.14(0x80000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   S_BRANCH %bb.14
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.11:
+  ; GCN-NEXT:   successors: %bb.12(0x80000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   S_BRANCH %bb.12
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.12:
+  ; GCN-NEXT:   successors: %bb.10(0x40000000), %bb.14(0x40000000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[S_OR_SAVEEXEC_B64_:%[0-9]+]]:sreg_64 = S_OR_SAVEEXEC_B64 [[S_XOR_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec
+  ; GCN-NEXT:   [[S_AND_B64_5:%[0-9]+]]:sreg_64 = S_AND_B64 $exec, [[S_OR_SAVEEXEC_B64_]], implicit-def $scc
+  ; GCN-NEXT:   $exec = S_XOR_B64_term $exec, [[S_AND_B64_5]], implicit-def $scc
+  ; GCN-NEXT:   S_CBRANCH_EXECZ %bb.14, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.10
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.14:
+  ; GCN-NEXT:   $exec = S_OR_B64 $exec, [[COPY]], implicit-def $scc
+  ; GCN-NEXT:   S_ENDPGM 0
+  bb.0:
+    successors: %bb.1, %bb.14
+
+    %0:vgpr_32 = IMPLICIT_DEF
+    %1:sreg_64 = V_CMP_EQ_U32_e64 0, killed %0:vgpr_32, implicit $exec
+    %2:sreg_64 = SI_IF %1:sreg_64, %bb.14, implicit-def $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.1
+
+  bb.1:
+  ; predecessors: %bb.0
+    successors: %bb.2, %bb.6
+
+    %3:vgpr_32 = IMPLICIT_DEF
+    %4:sreg_64 = V_CMP_EQ_U32_e64 0, killed %3:vgpr_32, implicit $exec
+    %5:sreg_64 = SI_IF killed %4:sreg_64, %bb.6, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.2
+
+  bb.2:
+  ; predecessors: %bb.1
+    successors: %bb.3, %bb.7
+
+    %6:vgpr_32 = IMPLICIT_DEF
+    %7:sreg_64 = V_CMP_EQ_U32_e64 0, killed %6:vgpr_32, implicit $exec
+    %8:sreg_64 = SI_IF killed %7:sreg_64, %bb.7, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.3
+
+  bb.3:
+  ; predecessors: %bb.2
+    successors: %bb.4, %bb.5
+
+    %9:vgpr_32 = IMPLICIT_DEF
+    %10:sreg_64 = V_CMP_EQ_U32_e64 0, killed %9:vgpr_32, implicit $exec
+    %11:sreg_64 = SI_IF killed %10:sreg_64, %bb.5, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.4
+
+  bb.4:
+  ; predecessors: %bb.3
+    successors: %bb.5
+
+    S_BRANCH %bb.5
+
+  bb.5:
+  ; predecessors: %bb.3, %bb.4
+    successors: %bb.7
+
+    SI_END_CF %11:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.7
+
+  bb.6:
+  ; predecessors: %bb.1, %bb.13
+    successors: %bb.14
+
+    SI_END_CF %5:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.14
+
+  bb.7:
+  ; predecessors: %bb2, %bb.5
+    successors: %bb.8
+
+    SI_END_CF %8:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.8
+
+  bb.8:
+  ; predecessors: %bb.7
+    successors: %bb.9
+
+    S_BRANCH %bb.9
+
+  bb.9:
+  ; predecessors: %bb.8
+    successors: %bb.11, %bb.12
+
+    %12:vgpr_32 = IMPLICIT_DEF
+    %13:sreg_64 = V_CMP_EQ_U32_e64 0, killed %12:vgpr_32, implicit $exec
+    %14:sreg_64 = SI_IF killed %13:sreg_64, %bb.12, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.11
+
+  bb.10:
+  ; predecessors: %bb.12
+    successors: %bb.13
+
+    S_BRANCH %bb.13
+
+  bb.11:
+  ; predecessors: %bb.9
+    successors: %bb.12
+
+    S_BRANCH %bb.12
+
+  bb.12:
+  ; predecessors: %bb.9, %bb.11
+    successors: %bb.10, %bb.13
+
+    %15:sreg_64 = SI_ELSE %14:sreg_64, %bb.13, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.10
+
+  bb.13:
+  ; predecessors: %bb.10, %bb.12
+    successors: %bb.6
+
+    SI_END_CF %15:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_BRANCH %bb.6
+
+  bb.14:
+  ; predecessors: %bb.0, %bb.6
+
+    SI_END_CF %2:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
+    S_ENDPGM 0
 
 ...


        


More information about the llvm-commits mailing list