[PATCH] D60150: [DAGCombiner][x86] scalarize splatted vector FP ops

Thu Apr 4 08:31:27 PDT 2019

spatel marked 2 inline comments as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/scalarize-fp.ll:440
 ; SSE:       # %bb.0:
-; SSE-NEXT:    rcpps %xmm2, %xmm3
-; SSE-NEXT:    mulps %xmm3, %xmm2
-; SSE-NEXT:    movaps {{.*#+}} xmm1 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
-; SSE-NEXT:    subps %xmm2, %xmm1
-; SSE-NEXT:    mulps %xmm3, %xmm1
-; SSE-NEXT:    addps %xmm3, %xmm1
-; SSE-NEXT:    mulps %xmm0, %xmm1
-; SSE-NEXT:    shufps {{.*#+}} xmm1 = xmm1[0,0,0,0]
-; SSE-NEXT:    movaps %xmm1, %xmm0
+; SSE-NEXT:    divss %xmm2, %xmm0
+; SSE-NEXT:    shufps {{.*#+}} xmm0 = xmm0[0,0,0,0]
----------------
Non-obvious reason for this diff: we use reciprocal estimates for vector FP division by default, but not scalar FP division. This is because the x86 estimate instruction is weak and breaks too much real-world scalar code.

================
Comment at: llvm/test/CodeGen/X86/scalarize-fp.ll:797
 ; AVX-NEXT:    retq
   %b = fdiv fast <8 x float> %vx, <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>
   %r = shufflevector <8 x float> %b, <8 x float> undef, <8 x i32> zeroinitializer
----------------
Intentionally chose fdiv with 1.0 divisor to show the follow-on simplification.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60150/new/

https://reviews.llvm.org/D60150