[llvm] 0b50d09 - [AMDGPU] Don't optimize agpr phis if the operand doesn't have subreg use (#91267)
via llvm-commits
llvm-commits at lists.llvm.org
Tue May 7 13:44:03 PDT 2024
Author: Shilei Tian
Date: 2024-05-07T16:44:00-04:00
New Revision: 0b50d095bccbd47c77e5ad2b03b09b41b696c4a0
URL: https://github.com/llvm/llvm-project/commit/0b50d095bccbd47c77e5ad2b03b09b41b696c4a0
DIFF: https://github.com/llvm/llvm-project/commit/0b50d095bccbd47c77e5ad2b03b09b41b696c4a0.diff
LOG: [AMDGPU] Don't optimize agpr phis if the operand doesn't have subreg use (#91267)
If the operand doesn't have any subreg use, the optimization could
potentially
generate `V_ACCVGPR_READ_B32_e64` with wrong register class. The
following example demonstrates the issue.
Input MIR:
```
bb.0:
%0:sgpr_32 = S_MOV_B32 0
%1:sgpr_128 = REG_SEQUENCE %0:sgpr_32, %subreg.sub0, %0:sgpr_32, %subreg.sub1, %0:sgpr_32, %subreg.sub2, %0:sgpr_32, %subreg.sub3
%2:vreg_128 = COPY %1:sgpr_128
%3:areg_128 = COPY %2:vreg_128, implicit $exec
bb.1:
%4:areg_128 = PHI %3:areg_128, %bb.0, %6:areg_128, %bb.1
%5:areg_128 = PHI %3:areg_128, %bb.0, %7:areg_128, %bb.1
...
```
Output of current implementation:
```
bb.0:
%0:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%1:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%2:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%3:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%4:areg_128 = REG_SEQUENCE %0:agpr_32, %subreg.sub0, %1:agpr_32, %subreg.sub1, %2:agpr_32, %subreg.sub2, %3:agpr_32, %subreg.sub3
%5:vreg_128 = V_ACCVGPR_READ_B32_e64 %4:areg_128, implicit $exec
%6:areg_128 = COPY %46:vreg_128
bb.1:
%7:areg_128 = PHI %6:areg_128, %bb.0, %9:areg_128, %bb.1
%8:areg_128 = PHI %6:areg_128, %bb.0, %10:areg_128, %bb.1
...
```
The problem is the generated `V_ACCVGPR_READ_B32_e64` instruction.
Apparently the operand `%4:areg_128` is not valid for this.
In this patch, we don't count the none-subreg use because
`V_ACCVGPR_READ_B32_e64` can't handle none-32-bit operand.
Fixes: SWDEV-459556
Added:
Modified:
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
Removed:
################################################################################
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index cb448aaafa4c0..5c411a0955878 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -2106,6 +2106,8 @@ bool SIFoldOperands::tryOptimizeAGPRPhis(MachineBasicBlock &MBB) {
for (unsigned K = 1; K < MI.getNumOperands(); K += 2) {
MachineOperand &PhiMO = MI.getOperand(K);
+ if (!PhiMO.getSubReg())
+ continue;
RegToMO[{PhiMO.getReg(), PhiMO.getSubReg()}].push_back(&PhiMO);
}
}
diff --git a/llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir b/llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
index a32b3d0f1e6b3..e94546fd5e8a5 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
+++ b/llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
@@ -465,7 +465,6 @@ body: |
; GFX90A-NEXT: bb.2:
; GFX90A-NEXT: S_ENDPGM 0
bb.0:
- ; Tests that tryOptimizeAGPRPhis kicks in for GFX908.
liveins: $sgpr0, $scc
successors: %bb.1
@@ -715,3 +714,85 @@ body: |
bb.3:
S_ENDPGM 0
...
+
+---
+name: skip_optimize_agpr_phi_without_subreg_use
+tracksRegLiveness: true
+body: |
+ ; GFX908-LABEL: name: skip_optimize_agpr_phi_without_subreg_use
+ ; GFX908: bb.0:
+ ; GFX908-NEXT: successors: %bb.1(0x80000000)
+ ; GFX908-NEXT: liveins: $scc
+ ; GFX908-NEXT: {{ $}}
+ ; GFX908-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ ; GFX908-NEXT: [[S_MOV_B32_:%[0-9]+]]:sgpr_32 = S_MOV_B32 0
+ ; GFX908-NEXT: [[V_ACCVGPR_WRITE_B32_e64_:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX908-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX908-NEXT: [[V_ACCVGPR_WRITE_B32_e64_2:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX908-NEXT: [[V_ACCVGPR_WRITE_B32_e64_3:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX908-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128_align2 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub1, [[V_ACCVGPR_WRITE_B32_e64_2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_3]], %subreg.sub3
+ ; GFX908-NEXT: {{ $}}
+ ; GFX908-NEXT: bb.1:
+ ; GFX908-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; GFX908-NEXT: liveins: $scc
+ ; GFX908-NEXT: {{ $}}
+ ; GFX908-NEXT: [[PHI:%[0-9]+]]:areg_128_align2 = PHI [[REG_SEQUENCE]], %bb.0, %7, %bb.1
+ ; GFX908-NEXT: [[V_MFMA_F32_16X16X4F32_e64_:%[0-9]+]]:areg_128_align2 = V_MFMA_F32_16X16X4F32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_]], [[PHI]], 0, 0, 0, implicit $mode, implicit $exec
+ ; GFX908-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = COPY [[V_MFMA_F32_16X16X4F32_e64_]], implicit $exec
+ ; GFX908-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
+ ; GFX908-NEXT: {{ $}}
+ ; GFX908-NEXT: bb.2:
+ ; GFX908-NEXT: S_ENDPGM 0
+ ;
+ ; GFX90A-LABEL: name: skip_optimize_agpr_phi_without_subreg_use
+ ; GFX90A: bb.0:
+ ; GFX90A-NEXT: successors: %bb.1(0x80000000)
+ ; GFX90A-NEXT: liveins: $scc
+ ; GFX90A-NEXT: {{ $}}
+ ; GFX90A-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ ; GFX90A-NEXT: [[S_MOV_B32_:%[0-9]+]]:sgpr_32 = S_MOV_B32 0
+ ; GFX90A-NEXT: [[V_ACCVGPR_WRITE_B32_e64_:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX90A-NEXT: [[V_ACCVGPR_WRITE_B32_e64_1:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX90A-NEXT: [[V_ACCVGPR_WRITE_B32_e64_2:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX90A-NEXT: [[V_ACCVGPR_WRITE_B32_e64_3:%[0-9]+]]:agpr_32 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
+ ; GFX90A-NEXT: [[REG_SEQUENCE:%[0-9]+]]:areg_128_align2 = REG_SEQUENCE [[V_ACCVGPR_WRITE_B32_e64_]], %subreg.sub0, [[V_ACCVGPR_WRITE_B32_e64_1]], %subreg.sub1, [[V_ACCVGPR_WRITE_B32_e64_2]], %subreg.sub2, [[V_ACCVGPR_WRITE_B32_e64_3]], %subreg.sub3
+ ; GFX90A-NEXT: {{ $}}
+ ; GFX90A-NEXT: bb.1:
+ ; GFX90A-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; GFX90A-NEXT: liveins: $scc
+ ; GFX90A-NEXT: {{ $}}
+ ; GFX90A-NEXT: [[PHI:%[0-9]+]]:areg_128_align2 = PHI [[REG_SEQUENCE]], %bb.0, %7, %bb.1
+ ; GFX90A-NEXT: [[V_MFMA_F32_16X16X4F32_e64_:%[0-9]+]]:areg_128_align2 = V_MFMA_F32_16X16X4F32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_]], [[PHI]], 0, 0, 0, implicit $mode, implicit $exec
+ ; GFX90A-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = COPY [[V_MFMA_F32_16X16X4F32_e64_]], implicit $exec
+ ; GFX90A-NEXT: S_CBRANCH_SCC1 %bb.1, implicit $scc
+ ; GFX90A-NEXT: {{ $}}
+ ; GFX90A-NEXT: bb.2:
+ ; GFX90A-NEXT: S_ENDPGM 0
+ bb.0:
+ liveins: $scc
+ successors: %bb.1
+
+ %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+ %1:sgpr_32 = S_MOV_B32 0
+ %2:sgpr_128 = REG_SEQUENCE %1, %subreg.sub0, %1, %subreg.sub1, %1, %subreg.sub2, %1, %subreg.sub3
+ %3:vreg_128 = COPY %2
+ %4:sreg_64 = S_MOV_B64 0
+ %5:areg_128_align2 = COPY %3, implicit $exec
+
+ bb.1:
+ liveins: $scc
+ successors: %bb.1, %bb.2
+
+ %9:areg_128_align2 = PHI %5, %bb.0, %10, %bb.1
+ %11:areg_128_align2 = V_MFMA_F32_16X16X4F32_e64 %0:vgpr_32, %0:vgpr_32, %9:areg_128_align2, 0, 0, 0, implicit $mode, implicit $exec
+ %12:vgpr_32 = COPY %11.sub3
+ %13:vgpr_32 = COPY %11.sub2
+ %14:vgpr_32 = COPY %11.sub1
+ %15:vgpr_32 = COPY %11.sub0
+ %10:areg_128_align2 = COPY %11, implicit $exec
+ S_CBRANCH_SCC1 %bb.1, implicit $scc
+
+ bb.2:
+ S_ENDPGM 0
+
+...
More information about the llvm-commits
mailing list