[PATCH] D26790: [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times

Thu Nov 17 09:45:27 PST 2016

spatel added a comment.

The vector version of this already works as expected:

  define <4 x float> @double_fold(<4 x float>* %x, <4 x float> %y) {
    %t0 = load <4 x float>, <4 x float>* %x, align 1
    %t1 = tail call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %y, <4 x float> %t0)
    %t2 = tail call <4 x float> @llvm.x86.sse.max.ps(<4 x float> %y, <4 x float> %t0)
    %t3 = fadd <4 x float> %t1, %t2
    ret <4 x float> %t3
  }

$ ./llc -o - foldfold.ll -mattr=avx
...

  vmovups	(%rdi), %xmm1
  vminps	%xmm1, %xmm0, %xmm2
  vmaxps	%xmm1, %xmm0, %xmm0
  vaddps	%xmm0, %xmm2, %xmm0
  retq

The divergence begins when we map the vector intrinsics to x86-specific nodes:

  X86_INTRINSIC_DATA(sse_max_ps,        INTR_TYPE_2OP, X86ISD::FMAX, 0),
  X86_INTRINSIC_DATA(sse_min_ps,        INTR_TYPE_2OP, X86ISD::FMIN, 0),

...but there's no equivalent mapping for the scalar intrinsics. Would that be a better/another fix (assuming it works, I didn't actually try it)?

https://reviews.llvm.org/D26790