[PATCH] D26790: [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 17 09:45:27 PST 2016
spatel added a comment.
The vector version of this already works as expected:
define <4 x float> @double_fold(<4 x float>* %x, <4 x float> %y) {
%t0 = load <4 x float>, <4 x float>* %x, align 1
%t1 = tail call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %y, <4 x float> %t0)
%t2 = tail call <4 x float> @llvm.x86.sse.max.ps(<4 x float> %y, <4 x float> %t0)
%t3 = fadd <4 x float> %t1, %t2
ret <4 x float> %t3
}
$ ./llc -o - foldfold.ll -mattr=avx
...
vmovups (%rdi), %xmm1
vminps %xmm1, %xmm0, %xmm2
vmaxps %xmm1, %xmm0, %xmm0
vaddps %xmm0, %xmm2, %xmm0
retq
The divergence begins when we map the vector intrinsics to x86-specific nodes:
X86_INTRINSIC_DATA(sse_max_ps, INTR_TYPE_2OP, X86ISD::FMAX, 0),
X86_INTRINSIC_DATA(sse_min_ps, INTR_TYPE_2OP, X86ISD::FMIN, 0),
...but there's no equivalent mapping for the scalar intrinsics. Would that be a better/another fix (assuming it works, I didn't actually try it)?
https://reviews.llvm.org/D26790
More information about the llvm-commits
mailing list