[PATCH] D110156: [AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 21 05:10:53 PDT 2021
foad created this revision.
foad added reviewers: arsenm, rampitec, ruiling, mareko.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl.
foad requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.
With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D110156
Files:
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
llvm/test/CodeGen/AMDGPU/mad-mix.ll
Index: llvm/test/CodeGen/AMDGPU/mad-mix.ll
===================================================================
--- llvm/test/CodeGen/AMDGPU/mad-mix.ll
+++ llvm/test/CodeGen/AMDGPU/mad-mix.ll
@@ -328,8 +328,7 @@
; GCN-LABEL: {{^}}v_mad_mix_clamp_f32_f16hi_f16hi_f16hi_elt:
; GFX900: v_mad_mix_f32 v0, v0, v1, v2 op_sel:[1,1,1] op_sel_hi:[1,1,1] clamp ; encoding
; GFX906: v_fma_mix_f32 v0, v0, v1, v2 op_sel:[1,1,1] op_sel_hi:[1,1,1] clamp ; encoding
-; VI: v_mac_f32_e64 v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}} clamp{{$}}
-; CI: v_mad_f32 v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}} clamp{{$}}
+; CIVI: v_mad_f32 v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}} clamp{{$}}
define float @v_mad_mix_clamp_f32_f16hi_f16hi_f16hi_elt(<2 x half> %src0, <2 x half> %src1, <2 x half> %src2) #0 {
%src0.hi = extractelement <2 x half> %src0, i32 1
%src1.hi = extractelement <2 x half> %src1, i32 1
Index: llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
===================================================================
--- llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
+++ llvm/test/CodeGen/AMDGPU/mad-mix-lo.ll
@@ -70,7 +70,7 @@
; GFX9-NEXT: v_cvt_f16_f32_e32 v0, v0
; GFX9-NEXT: s_setpc_b64
-; CIVI: v_mac_f32_e64 v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}} clamp{{$}}
+; CIVI: v_mad_f32 v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}}, v{{[0-9]}} clamp{{$}}
define half @v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_pre_cvt(half %src0, half %src1, float %src2) #0 {
%src0.ext = fpext half %src0 to float
%src1.ext = fpext half %src1 to float
Index: llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1388,6 +1388,14 @@
DefClamp->setImm(1);
MRI->replaceRegWith(MI.getOperand(0).getReg(), Def->getOperand(0).getReg());
MI.eraseFromParent();
+
+ // Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
+ // instruction, so we might as well convert it to the more flexible VOP3-only
+ // mad/fma form.
+ MachineFunction::iterator MBBI = Def->getParent()->getIterator();
+ if (MachineInstr *NewMI = TII->convertToThreeAddress(MBBI, *Def, nullptr))
+ Def->eraseFromParent();
+
return true;
}
@@ -1526,6 +1534,14 @@
DefOMod->setImm(OMod);
MRI->replaceRegWith(MI.getOperand(0).getReg(), Def->getOperand(0).getReg());
MI.eraseFromParent();
+
+ // Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
+ // instruction, so we might as well convert it to the more flexible VOP3-only
+ // mad/fma form.
+ MachineFunction::iterator MBBI = Def->getParent()->getIterator();
+ if (MachineInstr *NewMI = TII->convertToThreeAddress(MBBI, *Def, nullptr))
+ Def->eraseFromParent();
+
return true;
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D110156.373866.patch
Type: text/x-patch
Size: 2770 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210921/c8aebeea/attachment.bin>
More information about the llvm-commits
mailing list