[llvm-branch-commits] [llvm] [AMDGPU] Mark all instructions in WWM region as convergent (PR #204572)
Diana Picus via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Jun 18 05:19:34 PDT 2026
https://github.com/rovka created https://github.com/llvm/llvm-project/pull/204572
Mark instructions between ENTER_STRICT_WWM and EXIT_STRICT_WWM as
convergent, so they don't get moved out of the whole wave mode region
(see the licm-wwm.mir test). This doesn't automagically fix all our
woes, since things can still be moved out of the region before we even
run si-wqm, but there are rumours about moving WWM formation earlier
anyway.
This is not a substitute for proper WWM support - in particular, this
would inhibit most optimizations inside WWM regions with complex control
flow. Right now most WWM is relatively limited in size and complexity,
so I think this is acceptable until we get a more principled solution.
I haven't thought too much about whether or not we need this for WQM as
well.
Assisted by: Claude Sonnet
---
**Stack**:
- [2/2] #204571
- [1/2] #204570
⚠️ *Part of a stack created by [spr](https://github.com/nhaehnle/spr). Merging this PR using the GitHub UI may have unexpected results.*
>From 46f3a3e9165db2ac4a55c5abac1065ec7e2110a4 Mon Sep 17 00:00:00 2001
From: Diana Picus <diana-magda.picus at amd.com>
Date: Thu, 11 Jun 2026 09:34:08 +0200
Subject: [PATCH] [AMDGPU] Mark all instructions in WWM region as convergent
Mark instructions between ENTER_STRICT_WWM and EXIT_STRICT_WWM as
convergent, so they don't get moved out of the whole wave mode region
(see the licm-wwm.mir test). This doesn't automagically fix all our
woes, since things can still be moved out of the region before we even
run si-wqm, but there are rumours about moving WWM formation earlier
anyway.
This is not a substitute for proper WWM support - in particular, this
would inhibit most optimizations inside WWM regions with complex control
flow. Right now most WWM is relatively limited in size and complexity,
so I think this is acceptable until we get a more principled solution.
I haven't thought too much about whether or not we need this for WQM as
well.
Assisted by: Claude Sonnet
commit-id:9204c7e2
---
llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp | 8 +++++
llvm/test/CodeGen/AMDGPU/licm-wwm.mir | 27 ++++++++++++++-
.../CodeGen/AMDGPU/si-init-whole-wave.mir | 4 +--
llvm/test/CodeGen/AMDGPU/wqm-debug-instr.mir | 10 +++---
llvm/test/CodeGen/AMDGPU/wqm.mir | 34 +++++++++----------
5 files changed, 58 insertions(+), 25 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
index bf5dc2c529be6..4c2f316c04da2 100644
--- a/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
+++ b/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
@@ -1112,6 +1112,8 @@ void SIWholeQuadMode::lowerBlock(MachineBasicBlock &MBB, BlockInfo &BI) {
}
break;
default:
+ if (ActiveLanesReg && !MI.isDebugInstr() && !MI.getDesc().isConvergent())
+ MI.setFlag(MachineInstr::OverrideConvergence);
break;
}
if (SplitPoint)
@@ -1463,6 +1465,12 @@ void SIWholeQuadMode::processBlock(MachineBasicBlock &MBB, BlockInfo &BI,
toStrictMode(MBB, Before, SavedNonStrictReg, Needs);
State = Needs;
+
+ // In whole wave mode, we're going to flag all instructions inside
+ // a whole wave region as convergent (if they aren't already based on
+ // their opcode).
+ if (Needs == StateStrictWWM)
+ BI.NeedsLowering = true;
} else {
if (WQMToExact) {
if (!WQMFromExec && (OutNeeds & StateWQM)) {
diff --git a/llvm/test/CodeGen/AMDGPU/licm-wwm.mir b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
index d4267a595000b..a12f32d39a1bf 100644
--- a/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
+++ b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
@@ -1,9 +1,13 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -passes=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -run-pass=si-wqm,early-machinelicm -o - %s | FileCheck -check-prefix=GCN-SWAPPED %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -passes=si-wqm,early-machinelicm -o - %s | FileCheck -check-prefix=GCN-SWAPPED %s
# Machine LICM may hoist an intruction from a WWM region, which will force SI-WQM pass
# to create a second WWM region. This is an unwanted hoisting.
+# If machine LICM runs after the si-wqm pass, it will still hoist the
+# instruction, which is plain incorrect.
---
name: licm_move_wwm
@@ -14,7 +18,7 @@ body: |
; GCN-NEXT: successors: %bb.1(0x80000000)
; GCN-NEXT: {{ $}}
; GCN-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
+ ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = override_convergence V_MOV_B32_e32 1, implicit $exec
; GCN-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; GCN-NEXT: S_BRANCH %bb.1
; GCN-NEXT: {{ $}}
@@ -31,6 +35,27 @@ body: |
; GCN-NEXT: {{ $}}
; GCN-NEXT: bb.2:
; GCN-NEXT: S_ENDPGM 0
+ ;
+ ; GCN-SWAPPED-LABEL: name: licm_move_wwm
+ ; GCN-SWAPPED: bb.0:
+ ; GCN-SWAPPED-NEXT: successors: %bb.1(0x80000000)
+ ; GCN-SWAPPED-NEXT: {{ $}}
+ ; GCN-SWAPPED-NEXT: S_BRANCH %bb.1
+ ; GCN-SWAPPED-NEXT: {{ $}}
+ ; GCN-SWAPPED-NEXT: bb.1:
+ ; GCN-SWAPPED-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; GCN-SWAPPED-NEXT: {{ $}}
+ ; GCN-SWAPPED-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
+ ; GCN-SWAPPED-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = override_convergence V_MOV_B32_e32 1, implicit $exec
+ ; GCN-SWAPPED-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32_xm0 = V_READFIRSTLANE_B32 [[V_MOV_B32_e32_]], implicit $exec
+ ; GCN-SWAPPED-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
+ ; GCN-SWAPPED-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_READFIRSTLANE_B32_]]
+ ; GCN-SWAPPED-NEXT: $exec_lo = S_OR_B32 $exec_lo, [[COPY]], implicit-def $scc
+ ; GCN-SWAPPED-NEXT: S_CBRANCH_EXECNZ %bb.1, implicit $exec
+ ; GCN-SWAPPED-NEXT: S_BRANCH %bb.2
+ ; GCN-SWAPPED-NEXT: {{ $}}
+ ; GCN-SWAPPED-NEXT: bb.2:
+ ; GCN-SWAPPED-NEXT: S_ENDPGM 0
bb.0:
S_BRANCH %bb.1
diff --git a/llvm/test/CodeGen/AMDGPU/si-init-whole-wave.mir b/llvm/test/CodeGen/AMDGPU/si-init-whole-wave.mir
index c02301446861d..eb4a1aa49ba78 100644
--- a/llvm/test/CodeGen/AMDGPU/si-init-whole-wave.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-init-whole-wave.mir
@@ -93,9 +93,9 @@ body: |
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 24, [[COPY3]], 0, implicit $exec
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32_xm0_xexec = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: dead [[DEF:%[0-9]+]]:sreg_32_xm0_xexec = IMPLICIT_DEF
+ ; CHECK-NEXT: dead [[DEF:%[0-9]+]]:sreg_32_xm0_xexec = override_convergence IMPLICIT_DEF
; CHECK-NEXT: [[V_SET_INACTIVE_B32_:%[0-9]+]]:vgpr_32 = V_SET_INACTIVE_B32 0, [[COPY3]], 0, 71, undef [[ENTER_STRICT_WWM]], implicit $exec, implicit-def $scc
- ; CHECK-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 42, [[V_SET_INACTIVE_B32_]], 0, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_U32_e64 42, [[V_SET_INACTIVE_B32_]], 0, implicit $exec
; CHECK-NEXT: $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: early-clobber [[COPY2]]:vgpr_32 = V_MOV_B32_e32 [[V_ADD_U32_e64_]], implicit $exec
; CHECK-NEXT: {{ $}}
diff --git a/llvm/test/CodeGen/AMDGPU/wqm-debug-instr.mir b/llvm/test/CodeGen/AMDGPU/wqm-debug-instr.mir
index 6c646a2fd5539..92fdf99a7613c 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm-debug-instr.mir
+++ b/llvm/test/CodeGen/AMDGPU/wqm-debug-instr.mir
@@ -68,7 +68,7 @@ body: |
; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = override_convergence COPY $vgpr0
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr2
; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = COPY $sgpr1
@@ -80,13 +80,13 @@ body: |
; CHECK-NEXT: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], [[DEF]], 0, 0, 0, 0, implicit $exec
; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_32_xm0 = COPY $scc
; CHECK-NEXT: [[ENTER_STRICT_WWM1:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: $scc = COPY [[COPY4]]
- ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 [[COPY]], [[COPY]], implicit-def $vcc, implicit $exec
- ; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sgpr_32 = S_CSELECT_B32 [[COPY1]], [[COPY2]], implicit $scc
+ ; CHECK-NEXT: $scc = override_convergence COPY [[COPY4]]
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_CO_U32_e32 [[COPY]], [[COPY]], implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sgpr_32 = override_convergence S_CSELECT_B32 [[COPY1]], [[COPY2]], implicit $scc
; CHECK-NEXT: DBG_VALUE $noreg, $noreg
; CHECK-NEXT: DBG_VALUE $noreg, $noreg
; CHECK-NEXT: DBG_VALUE $noreg, $noreg
- ; CHECK-NEXT: [[V_ADD_CO_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 [[S_CSELECT_B32_]], [[V_ADD_CO_U32_e32_]], implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e32_1:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_CO_U32_e32 [[S_CSELECT_B32_]], [[V_ADD_CO_U32_e32_]], implicit-def $vcc, implicit $exec
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM1]]
; CHECK-NEXT: early-clobber $vgpr0 = V_MOV_B32_e32 [[V_ADD_CO_U32_e32_1]], implicit $exec
; CHECK-NEXT: $vgpr1 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
diff --git a/llvm/test/CodeGen/AMDGPU/wqm.mir b/llvm/test/CodeGen/AMDGPU/wqm.mir
index ceb1b3e16d727..4ad6dcef55a2f 100644
--- a/llvm/test/CodeGen/AMDGPU/wqm.mir
+++ b/llvm/test/CodeGen/AMDGPU/wqm.mir
@@ -81,14 +81,14 @@ body: |
; CHECK: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
- ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr2
- ; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = COPY $sgpr1
- ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr0
- ; CHECK-NEXT: S_CMP_LT_I32 0, [[COPY3]], implicit-def $scc
- ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 [[COPY]], [[COPY]], implicit-def $vcc, implicit $exec
- ; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sgpr_32 = S_CSELECT_B32 [[COPY1]], [[COPY2]], implicit $scc
- ; CHECK-NEXT: [[V_ADD_CO_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 [[S_CSELECT_B32_]], [[V_ADD_CO_U32_e32_]], implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = override_convergence COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = override_convergence COPY $sgpr2
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = override_convergence COPY $sgpr1
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = override_convergence COPY $sgpr0
+ ; CHECK-NEXT: override_convergence S_CMP_LT_I32 0, [[COPY3]], implicit-def $scc
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_CO_U32_e32 [[COPY]], [[COPY]], implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sgpr_32 = override_convergence S_CSELECT_B32 [[COPY1]], [[COPY2]], implicit $scc
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e32_1:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_CO_U32_e32 [[S_CSELECT_B32_]], [[V_ADD_CO_U32_e32_]], implicit-def $vcc, implicit $exec
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: early-clobber $vgpr0 = V_MOV_B32_e32 [[V_ADD_CO_U32_e32_1]], implicit $exec
; CHECK-NEXT: SI_RETURN_TO_EPILOG $vgpr0
@@ -117,7 +117,7 @@ body: |
; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = override_convergence COPY $vgpr0
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr2
; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = COPY $sgpr1
@@ -129,10 +129,10 @@ body: |
; CHECK-NEXT: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], [[DEF]], 0, 0, 0, 0, implicit $exec
; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_32_xm0 = COPY $scc
; CHECK-NEXT: [[ENTER_STRICT_WWM1:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: $scc = COPY [[COPY4]]
- ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 [[COPY]], [[COPY]], implicit-def $vcc, implicit $exec
- ; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sgpr_32 = S_CSELECT_B32 [[COPY1]], [[COPY2]], implicit $scc
- ; CHECK-NEXT: [[V_ADD_CO_U32_e32_1:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 [[S_CSELECT_B32_]], [[V_ADD_CO_U32_e32_]], implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: $scc = override_convergence COPY [[COPY4]]
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_CO_U32_e32 [[COPY]], [[COPY]], implicit-def $vcc, implicit $exec
+ ; CHECK-NEXT: [[S_CSELECT_B32_:%[0-9]+]]:sgpr_32 = override_convergence S_CSELECT_B32 [[COPY1]], [[COPY2]], implicit $scc
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e32_1:%[0-9]+]]:vgpr_32 = override_convergence V_ADD_CO_U32_e32 [[S_CSELECT_B32_]], [[V_ADD_CO_U32_e32_]], implicit-def $vcc, implicit $exec
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM1]]
; CHECK-NEXT: early-clobber $vgpr0 = V_MOV_B32_e32 [[V_ADD_CO_U32_e32_1]], implicit $exec
; CHECK-NEXT: $vgpr1 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
@@ -213,7 +213,7 @@ body: |
; CHECK-NEXT: dead [[DEF:%[0-9]+]]:sreg_64_xexec = IMPLICIT_DEF
; CHECK-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY6]], implicit $exec, implicit-def $scc
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: [[COPY8:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
+ ; CHECK-NEXT: [[COPY8:%[0-9]+]]:vgpr_32 = override_convergence COPY [[S_MOV_B32_]]
; CHECK-NEXT: [[V_MOV_B32_dpp:%[0-9]+]]:vgpr_32 = V_MOV_B32_dpp [[COPY8]], [[COPY7]], 323, 12, 15, 0, implicit $exec
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: early-clobber %15:vgpr_32 = V_MOV_B32_e32 [[V_MOV_B32_dpp]], implicit $exec
@@ -259,7 +259,7 @@ body: |
; CHECK-NEXT: dead [[BUFFER_LOAD_DWORDX2_OFFSET:%[0-9]+]]:vreg_64 = BUFFER_LOAD_DWORDX2_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 0, 0, 0, implicit $exec
; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_64 = COPY $exec
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_64 = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = override_convergence V_MOV_B32_e32 0, implicit $exec
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: [[V_MBCNT_LO_U32_B32_e64_:%[0-9]+]]:vgpr_32 = V_MBCNT_LO_U32_B32_e64 [[COPY4]].sub0, 0, implicit $exec
; CHECK-NEXT: [[V_MOV_B32_dpp:%[0-9]+]]:vgpr_32 = V_MOV_B32_dpp [[V_MOV_B32_e32_]], [[V_MBCNT_LO_U32_B32_e64_]], 312, 15, 15, 0, implicit $exec
@@ -387,10 +387,10 @@ body: |
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; CHECK-NEXT: [[BUFFER_LOAD_DWORDX2_OFFEN:%[0-9]+]]:vreg_64 = BUFFER_LOAD_DWORDX2_OFFEN [[COPY1]], [[COPY]], 0, 0, 0, 0, implicit $exec
; CHECK-NEXT: [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_64_xexec = ENTER_STRICT_WWM -1, implicit-def $exec, implicit-def $scc, implicit $exec
- ; CHECK-NEXT: dead [[DEF:%[0-9]+]]:sreg_64_xexec = IMPLICIT_DEF
+ ; CHECK-NEXT: dead [[DEF:%[0-9]+]]:sreg_64_xexec = override_convergence IMPLICIT_DEF
; CHECK-NEXT: [[BUFFER_LOAD_DWORDX2_OFFEN:%[0-9]+]].sub0:vreg_64 = V_SET_INACTIVE_B32 0, [[BUFFER_LOAD_DWORDX2_OFFEN]].sub0, 0, 0, undef [[ENTER_STRICT_WWM]], implicit $exec, implicit-def $scc
; CHECK-NEXT: [[BUFFER_LOAD_DWORDX2_OFFEN:%[0-9]+]].sub1:vreg_64 = V_SET_INACTIVE_B32 0, [[BUFFER_LOAD_DWORDX2_OFFEN]].sub1, 0, 0, undef [[ENTER_STRICT_WWM]], implicit $exec, implicit-def $scc
- ; CHECK-NEXT: [[V_MAX_F64_e64_:%[0-9]+]]:vreg_64 = nnan nsz arcp contract reassoc nofpexcept V_MAX_F64_e64 0, [[BUFFER_LOAD_DWORDX2_OFFEN]], 0, [[BUFFER_LOAD_DWORDX2_OFFEN]], 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[V_MAX_F64_e64_:%[0-9]+]]:vreg_64 = nnan nsz arcp contract reassoc nofpexcept override_convergence V_MAX_F64_e64 0, [[BUFFER_LOAD_DWORDX2_OFFEN]], 0, [[BUFFER_LOAD_DWORDX2_OFFEN]], 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: $exec = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
; CHECK-NEXT: early-clobber $vgpr0 = V_MOV_B32_e32 [[V_MAX_F64_e64_]].sub0, implicit $exec
; CHECK-NEXT: early-clobber $vgpr1 = V_MOV_B32_e32 [[V_MAX_F64_e64_]].sub1, implicit $exec
More information about the llvm-branch-commits
mailing list