[PATCH] D69280: [AMDGPU] Allow folding of sgpr to vgpr copy

Mon Oct 21 14:51:02 PDT 2019

rampitec marked an inline comment as done.
rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll:79-80
 ; SIVI:  v_mad_f32 {{v[0-9]+}}, |[[X]]|, 2.0, v{{[0-9]+}}
-; GFX10: v_fma_f32 {{v[0-9]+}}, |[[X:s[0-9]+]]|, 2.0, {{s[0-9]+}}
-; GFX10: v_fma_f32 {{v[0-9]+}}, |[[X]]|, 2.0, {{s[0-9]+}}
+; GFX10: v_fma_f32 {{v[0-9]+}}, 2.0, |[[X:s[0-9]+]]|, {{v[0-9]+}}
+; GFX10: v_fma_f32 {{v[0-9]+}}, 2.0, |[[X]]|, {{v[0-9]+}}
 define amdgpu_kernel void @multiple_use_fadd_multi_fmad_f32(float addrspace(1)* %out, float %x, float %y, float %z) #0 {
----------------
rampitec wrote:
> arsenm wrote:
> > This looks like it got worse?
> Yes, this is regression specific to fma/mac. The reg class after the folding mismatches xm0/xexec operand definition of fma src.
> The regression is however small, while some copies are eliminated in other cases.
I.e. we should refine how we use sgpr register classes instead of inhibiting folding.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69280/new/

https://reviews.llvm.org/D69280