[PATCH] D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 23 10:28:53 PDT 2019
arsenm added a comment.
I somewhat expected this to be handled in SIFoldOperands, as constants are already folded there and this is essentially the same problem. It will always save an instruction. This is the version RA uses and I'm not sure I expect it to do any folding
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2630-2633
+ if (!Src0Mods && !Src1Mods && !Clamp && !Omod &&
+ (ST.getConstantBusLimit(Opc) > 1 ||
+ !Src0->isReg() ||
+ !RI.isSGPRReg(MBB->getParent()->getRegInfo(), Src0->getReg()))) {
----------------
These are the exact conditions as checked above
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2636-2637
+ const MachineRegisterInfo *MRI = &MF->getRegInfo();
+ auto *Def = MRI->getUniqueVRegDef(Src2->getReg());
+ if (Def->getOpcode() == AMDGPU::COPY)
+ Src2 = &(Def->getOperand(1));
----------------
This can fail
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2638
+ if (Def->getOpcode() == AMDGPU::COPY)
+ Src2 = &(Def->getOperand(1));
+ }
----------------
Extra parentheses
================
Comment at: test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll:4-12
+define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y) {
+entry:
+ %buf.load = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %a, i32 0, i32 0)
+ %vec1 = bitcast <4 x i32> %buf.load to <4 x float>
+ %.i095 = extractelement <4 x float> %vec1, i32 0
+ %.i098 = fsub nnan arcp float %b, %.i095
+ %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %.i095) #3
----------------
You should be able to reduce this. You shouldn't need any vector operations
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66666/new/
https://reviews.llvm.org/D66666
More information about the llvm-commits
mailing list