[PATCH] D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands

Fri Aug 23 10:28:53 PDT 2019

arsenm added a comment.

I somewhat expected this to be handled in SIFoldOperands, as constants are already folded there and this is essentially the same problem. It will always save an instruction. This is the version RA uses and I'm not sure I expect it to do any folding

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2630-2633
+  if (!Src0Mods && !Src1Mods && !Clamp && !Omod &&
+      (ST.getConstantBusLimit(Opc) > 1 ||
+       !Src0->isReg() ||
+       !RI.isSGPRReg(MBB->getParent()->getRegInfo(), Src0->getReg()))) {
----------------
These are the exact conditions as checked above

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2636-2637
+    const MachineRegisterInfo *MRI = &MF->getRegInfo();
+    auto *Def = MRI->getUniqueVRegDef(Src2->getReg());
+    if (Def->getOpcode() == AMDGPU::COPY)
+      Src2 = &(Def->getOperand(1));
----------------
This can fail

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2638
+    if (Def->getOpcode() == AMDGPU::COPY)
+      Src2 = &(Def->getOperand(1));
+  }
----------------
Extra parentheses

================
Comment at: test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll:4-12
+define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y) {
+entry:
+  %buf.load = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %a, i32 0, i32 0)
+  %vec1 = bitcast <4 x i32> %buf.load to <4 x float>
+  %.i095 = extractelement <4 x float> %vec1, i32 0
+  %.i098 = fsub nnan arcp float %b, %.i095
+  %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %.i095) #3
----------------
You should be able to reduce this. You shouldn't need any vector operations

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66666/new/

https://reviews.llvm.org/D66666