[PATCH] D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands

Fri Aug 23 13:31:16 PDT 2019

rtaylor marked 2 inline comments as done.
rtaylor added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2630-2633
+  if (!Src0Mods && !Src1Mods && !Clamp && !Omod &&
+      (ST.getConstantBusLimit(Opc) > 1 ||
+       !Src0->isReg() ||
+       !RI.isSGPRReg(MBB->getParent()->getRegInfo(), Src0->getReg()))) {
----------------
arsenm wrote:
> rtaylor wrote:
> > arsenm wrote:
> > > These are the exact conditions as checked above
> > Yes, it is. I could create a local function that does this and replace both with that, it would be just as ugly since there are so many conditions (params) to pass, or I could pass MI and re-get all those operands, which would be the exact code that is already in the function.
> Why can't you just set a bool flag inside the first condition?
> 
> Anyway, I would prefer if the wasn't done here at all, and SIFoldOperands took care of this
Sure, no problem, if you think that looks better.

This test case doesn't seem to be handled by SIFoldOperands (V_FMAC before and V_FMAC after) SIFoldOperands, so doing this change there wouldn't do anything.

================
Comment at: test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll:4-12
+define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y) {
+entry:
+  %buf.load = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %a, i32 0, i32 0)
+  %vec1 = bitcast <4 x i32> %buf.load to <4 x float>
+  %.i095 = extractelement <4 x float> %vec1, i32 0
+  %.i098 = fsub nnan arcp float %b, %.i095
+  %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %.i095) #3
----------------
rtaylor wrote:
> arsenm wrote:
> > You should be able to reduce this. You shouldn't need any vector operations
> I tried a few different things but I probably just couldn't find the simple form to produce a convertable V_FMAC. Suggestions?
Just to be more explicit, this test case was stripped directly from a shader. If I do something like:

define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y, float %c) {
entry:
  %.i098 = fsub nnan arcp float %b, %c
  %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %b) #3
  ret float %fma1
}

results in no v_mov, either inreg for b or not. If it's not inreg than it's a vgpr directly and if it's inreg than it's a sgpr already.  Is there a simpler way to take in a vector and convert to scalar? 

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66666/new/

https://reviews.llvm.org/D66666