[llvm] AMDGPU: Handle V->A MFMA copy from case with immediate src2 (PR #153023)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 2 23:40:38 PDT 2025
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/153023
>From 26b35313d8efb06f3d66506401e9c806fc337196 Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Mon, 11 Aug 2025 10:47:44 +0900
Subject: [PATCH 1/2] AMDGPU: Handle rewriting VGPR MFMA fed from AGPR copy
Previously we handled the inverse situation only.
>From d7f037ae8ff9d03bfe39055b12217013bb6ec66c Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Mon, 11 Aug 2025 18:22:09 +0900
Subject: [PATCH 2/2] AMDGPU: Handle V->A MFMA copy from case with immediate
src2
Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.
---
llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp | 5 ++++-
.../CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir | 4 ++--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 5468bdd81cd98..21cf9cc6878fb 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -375,8 +375,11 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
Register CopyDstReg = UseMI.getOperand(0).getReg();
if (!CopyDstReg.isVirtual())
continue;
+ for (MachineOperand &CopyUseMO : MRI.reg_nodbg_operands(CopyDstReg)) {
+ if (!CopyUseMO.readsReg())
+ continue;
- for (MachineInstr &CopyUseMI : MRI.use_instructions(CopyDstReg)) {
+ MachineInstr &CopyUseMI = *CopyUseMO.getParent();
if (isRewriteCandidate(CopyUseMI)) {
if (tryReassigningMFMAChain(CopyUseMI, CopyDstReg,
VRM.getPhys(CopyDstReg)))
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
index e636f6e6c6ca7..ad490f8fe54d8 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
@@ -187,8 +187,8 @@ body: |
; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:areg_128_align2 = GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
- ; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
- ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:vreg_128_align2 = V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_128_align2 = COPY [[GLOBAL_LOAD_DWORDX4_]]
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = V_MFMA_F64_4X4X4F64_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, implicit $exec
; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
; CHECK-NEXT: SI_RETURN
%0:vreg_64_align2 = COPY $vgpr4_vgpr5
More information about the llvm-commits
mailing list