[PATCH] D13988: [X86][SSE] Add general memory folding for (V)INSERTPS instruction

Robert Lougher via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 4 06:48:11 PST 2015

rob.lougher added a comment.

Hi Simon,

We discovered a bug internally caused by the non-zeroing of the countS bits in the folding of the insertps load.  Although countS bits are ignored when loading from memory on insertps, we need to explicitly set them to 0 as another optimization may later "unfold" the load.  This is demonstrated by the following testcase (the checks are based on the RUN lines from the sse41.ll file).

define <4 x float> @foo(<4 x float>* %v0, <4 x float>* %v1) {
; X32-LABEL: foo:
; X32:       ## BB#0:
; X32-NEXT:    movl {{[0-9]+}}(%esp), %eax
; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT:    movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; X32-NEXT:    movaps (%eax), %xmm0
; X32-NEXT:    insertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]
; X32-NEXT:    addps %xmm1, %xmm0
; X32-NEXT:    retl
; X64-LABEL: foo:
; X64:       ## BB#0:
; X64-NEXT:    movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; X64-NEXT:    movaps (%rdi), %xmm0
; X64-NEXT:    insertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]
; X64-NEXT:    addps %xmm1, %xmm0
; X64-NEXT:    retq

  %a = getelementptr inbounds <4 x float>, <4 x float>* %v1, i64 0, i64 1
  %b = load float, float* %a, align 4
  %c = insertelement <4 x float> undef, float %b, i32 0
  %d = load <4 x float>, <4 x float>* %v1, align 16
  %e = load <4 x float>, <4 x float>* %v0, align 16
  %f = shufflevector <4 x float> %e, <4 x float> %d, <4 x i32> <i32 0, i32 1, i32 2, i32 5>
  %g = fadd <4 x float> %c, %f
  ret <4 x float> %g


Another minor comment is that your change will do general memory load folding in addition to stack folding, but you've only got tests for stack folding.



More information about the llvm-commits mailing list