[PATCH] D69280: [AMDGPU] Allow folding of sgpr to vgpr copy
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 21 14:41:45 PDT 2019
rampitec marked an inline comment as done.
rampitec added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll:79-80
; SIVI: v_mad_f32 {{v[0-9]+}}, |[[X]]|, 2.0, v{{[0-9]+}}
-; GFX10: v_fma_f32 {{v[0-9]+}}, |[[X:s[0-9]+]]|, 2.0, {{s[0-9]+}}
-; GFX10: v_fma_f32 {{v[0-9]+}}, |[[X]]|, 2.0, {{s[0-9]+}}
+; GFX10: v_fma_f32 {{v[0-9]+}}, 2.0, |[[X:s[0-9]+]]|, {{v[0-9]+}}
+; GFX10: v_fma_f32 {{v[0-9]+}}, 2.0, |[[X]]|, {{v[0-9]+}}
define amdgpu_kernel void @multiple_use_fadd_multi_fmad_f32(float addrspace(1)* %out, float %x, float %y, float %z) #0 {
----------------
arsenm wrote:
> This looks like it got worse?
Yes, this is regression specific to fma/mac. The reg class after the folding mismatches xm0/xexec operand definition of fma src.
The regression is however small, while some copies are eliminated in other cases.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69280/new/
https://reviews.llvm.org/D69280
More information about the llvm-commits
mailing list