[PATCH] D69280: [AMDGPU] Allow folding of sgpr to vgpr copy
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 21 14:51:02 PDT 2019
rampitec marked an inline comment as done.
rampitec added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll:79-80
; SIVI: v_mad_f32 {{v[0-9]+}}, |[[X]]|, 2.0, v{{[0-9]+}}
-; GFX10: v_fma_f32 {{v[0-9]+}}, |[[X:s[0-9]+]]|, 2.0, {{s[0-9]+}}
-; GFX10: v_fma_f32 {{v[0-9]+}}, |[[X]]|, 2.0, {{s[0-9]+}}
+; GFX10: v_fma_f32 {{v[0-9]+}}, 2.0, |[[X:s[0-9]+]]|, {{v[0-9]+}}
+; GFX10: v_fma_f32 {{v[0-9]+}}, 2.0, |[[X]]|, {{v[0-9]+}}
define amdgpu_kernel void @multiple_use_fadd_multi_fmad_f32(float addrspace(1)* %out, float %x, float %y, float %z) #0 {
----------------
rampitec wrote:
> arsenm wrote:
> > This looks like it got worse?
> Yes, this is regression specific to fma/mac. The reg class after the folding mismatches xm0/xexec operand definition of fma src.
> The regression is however small, while some copies are eliminated in other cases.
I.e. we should refine how we use sgpr register classes instead of inhibiting folding.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69280/new/
https://reviews.llvm.org/D69280
More information about the llvm-commits
mailing list