[PATCH] D60150: [DAGCombiner][x86] scalarize splatted vector FP ops
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 4 08:31:27 PDT 2019
spatel marked 2 inline comments as done.
spatel added inline comments.
================
Comment at: llvm/test/CodeGen/X86/scalarize-fp.ll:440
; SSE: # %bb.0:
-; SSE-NEXT: rcpps %xmm2, %xmm3
-; SSE-NEXT: mulps %xmm3, %xmm2
-; SSE-NEXT: movaps {{.*#+}} xmm1 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
-; SSE-NEXT: subps %xmm2, %xmm1
-; SSE-NEXT: mulps %xmm3, %xmm1
-; SSE-NEXT: addps %xmm3, %xmm1
-; SSE-NEXT: mulps %xmm0, %xmm1
-; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0,0,0]
-; SSE-NEXT: movaps %xmm1, %xmm0
+; SSE-NEXT: divss %xmm2, %xmm0
+; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,0,0,0]
----------------
Non-obvious reason for this diff: we use reciprocal estimates for vector FP division by default, but not scalar FP division. This is because the x86 estimate instruction is weak and breaks too much real-world scalar code.
================
Comment at: llvm/test/CodeGen/X86/scalarize-fp.ll:797
; AVX-NEXT: retq
%b = fdiv fast <8 x float> %vx, <float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0, float 8.0>
%r = shufflevector <8 x float> %b, <8 x float> undef, <8 x i32> zeroinitializer
----------------
Intentionally chose fdiv with 1.0 divisor to show the follow-on simplification.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D60150/new/
https://reviews.llvm.org/D60150
More information about the llvm-commits
mailing list