[PATCH] D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands

Fri Aug 23 11:55:54 PDT 2019

rtaylor marked 2 inline comments as done.
rtaylor added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2630-2633
+  if (!Src0Mods && !Src1Mods && !Clamp && !Omod &&
+      (ST.getConstantBusLimit(Opc) > 1 ||
+       !Src0->isReg() ||
+       !RI.isSGPRReg(MBB->getParent()->getRegInfo(), Src0->getReg()))) {
----------------
arsenm wrote:
> These are the exact conditions as checked above
Yes, it is. I could create a local function that does this and replace both with that, it would be just as ugly since there are so many conditions (params) to pass, or I could pass MI and re-get all those operands, which would be the exact code that is already in the function.

================
Comment at: test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll:4-12
+define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y) {
+entry:
+  %buf.load = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %a, i32 0, i32 0)
+  %vec1 = bitcast <4 x i32> %buf.load to <4 x float>
+  %.i095 = extractelement <4 x float> %vec1, i32 0
+  %.i098 = fsub nnan arcp float %b, %.i095
+  %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %.i095) #3
----------------
arsenm wrote:
> You should be able to reduce this. You shouldn't need any vector operations
I tried a few different things but I probably just couldn't find the simple form to produce a convertable V_FMAC. Suggestions?

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66666/new/

https://reviews.llvm.org/D66666