[llvm] [AMDGCN][SIWholeQuadMode] Handle case when SI_KILL_I1_TERMINATOR -1,0 is not the only terminator (PR #122922)
Juan Manuel Martinez CaamaƱo via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 14 07:39:20 PST 2025
https://github.com/jmmartinez created https://github.com/llvm/llvm-project/pull/122922
The `SI_KILL_I1_TERMINATOR -1,0` instruction does not have any effect, so we lowered them to unconditional branches.
However, there may be more than a single terminator in the block (after the `SI_KILL_I1_TERMINATOR`). This resulted in an assertion being triggered later in the pipeline.
To handle this case, we simply remove the `SI_KILL_I1_TERMINATOR -1, 0` when its not the last terminator.
Solves SWDEV-508819
>From a9712e3b71ca7ba027a9df637f4a4e8e01f0add9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Juan=20Manuel=20Martinez=20Caama=C3=B1o?= <juamarti at amd.com>
Date: Tue, 14 Jan 2025 15:11:41 +0100
Subject: [PATCH] [AMDGCN][SIWholeQuadMode] Handle case when
SI_KILL_I1_TERMINATOR -1 0 is not the unique terminator
The `SI_KILL_I1_TERMINATOR -1,0` instruction does not have any effect,
so we lowered them to unconditional branches.
However, there may be more than a single terminator in the block (after
the `SI_KILL_I1_TERMINATOR`). This resulted in an assertion being
triggered later in the pipeline.
To handle this case, we simply remove the `SI_KILL_I1_TERMINATOR -1, 0`
when its not the last terminator.
Solves SWDEV-508819
---
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp | 11 +++--
.../AMDGPU/kill-true-in-return-block.ll | 41 +++++++++++++++++++
2 files changed, 49 insertions(+), 3 deletions(-)
create mode 100644 llvm/test/CodeGen/AMDGPU/kill-true-in-return-block.ll
diff --git a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
index 9fbb847da2af1c..2795f371de32cb 100644
--- a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+++ b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
@@ -947,9 +947,14 @@ MachineInstr *SIWholeQuadMode::lowerKillI1(MachineBasicBlock &MBB,
LIS->RemoveMachineInstrFromMaps(MI);
} else {
assert(MBB.succ_size() == 1);
- NewTerm = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_BRANCH))
- .addMBB(*MBB.succ_begin());
- LIS->ReplaceMachineInstrInMaps(MI, *NewTerm);
+ bool IsLastTerminator = MI.getReverseIterator() == MBB.rbegin();
+ if (IsLastTerminator) {
+ NewTerm = BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_BRANCH))
+ .addMBB(*MBB.succ_begin());
+ LIS->ReplaceMachineInstrInMaps(MI, *NewTerm);
+ } else {
+ LIS->RemoveMachineInstrFromMaps(MI);
+ }
}
MBB.remove(&MI);
return NewTerm;
diff --git a/llvm/test/CodeGen/AMDGPU/kill-true-in-return-block.ll b/llvm/test/CodeGen/AMDGPU/kill-true-in-return-block.ll
new file mode 100644
index 00000000000000..021c845d5ea6bb
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/kill-true-in-return-block.ll
@@ -0,0 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a %s -o - | FileCheck %s
+
+define amdgpu_ps float @kill_true(i1 %.not) {
+; CHECK-LABEL: kill_true:
+; CHECK: ; %bb.0: ; %entry
+; CHECK-NEXT: s_mov_b64 s[0:1], exec
+; CHECK-NEXT: s_wqm_b64 exec, exec
+; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
+; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
+; CHECK-NEXT: s_xor_b64 s[4:5], vcc, -1
+; CHECK-NEXT: s_and_saveexec_b64 s[2:3], s[4:5]
+; CHECK-NEXT: s_cbranch_execz .LBB0_2
+; CHECK-NEXT: ; %bb.1: ; %if1
+; CHECK-NEXT: s_mov_b32 s4, 0
+; CHECK-NEXT: ; kill: def $sgpr4 killed $sgpr4 killed $exec
+; CHECK-NEXT: v_pk_mov_b32 v[0:1], 0, 0
+; CHECK-NEXT: v_mov_b32_e32 v2, s4
+; CHECK-NEXT: flat_store_dword v[0:1], v2
+; CHECK-NEXT: .LBB0_2: ; %endif1
+; CHECK-NEXT: s_or_b64 exec, exec, s[2:3]
+; CHECK-NEXT: s_and_b64 exec, exec, s[0:1]
+; CHECK-NEXT: v_mov_b32_e32 v0, 0
+; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; CHECK-NEXT: ; return to shader part epilog
+entry:
+ br i1 %.not, label %endif1, label %if1
+
+if1:
+ %C = call float @llvm.amdgcn.wqm.f32(float 0.000000e+00)
+ store float %C, ptr null, align 4
+ br label %endif1
+
+endif1:
+ call void @llvm.amdgcn.kill(i1 true)
+ ret float 0.000000e+00
+}
+
+declare void @llvm.amdgcn.kill(i1)
+
+declare float @llvm.amdgcn.wqm.f32(float)
More information about the llvm-commits
mailing list