[PATCH] D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 23 10:28:53 PDT 2019


arsenm added a comment.

I somewhat expected this to be handled in SIFoldOperands, as constants are already folded there and this is essentially the same problem. It will always save an instruction. This is the version RA uses and I'm not sure I expect it to do any folding



================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2630-2633
+  if (!Src0Mods && !Src1Mods && !Clamp && !Omod &&
+      (ST.getConstantBusLimit(Opc) > 1 ||
+       !Src0->isReg() ||
+       !RI.isSGPRReg(MBB->getParent()->getRegInfo(), Src0->getReg()))) {
----------------
These are the exact conditions as checked above


================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2636-2637
+    const MachineRegisterInfo *MRI = &MF->getRegInfo();
+    auto *Def = MRI->getUniqueVRegDef(Src2->getReg());
+    if (Def->getOpcode() == AMDGPU::COPY)
+      Src2 = &(Def->getOperand(1));
----------------
This can fail


================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2638
+    if (Def->getOpcode() == AMDGPU::COPY)
+      Src2 = &(Def->getOperand(1));
+  }
----------------
Extra parentheses


================
Comment at: test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll:4-12
+define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y) {
+entry:
+  %buf.load = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %a, i32 0, i32 0)
+  %vec1 = bitcast <4 x i32> %buf.load to <4 x float>
+  %.i095 = extractelement <4 x float> %vec1, i32 0
+  %.i098 = fsub nnan arcp float %b, %.i095
+  %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %.i095) #3
----------------
You should be able to reduce this. You shouldn't need any vector operations


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66666/new/

https://reviews.llvm.org/D66666





More information about the llvm-commits mailing list