[PATCH] D26660: [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul

Tue Nov 15 00:03:11 PST 2016

craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/vec_ss_load_fold.ll:41
+; X32_AVX1-NEXT:    vmulss LCPI0_1, %xmm0, %xmm0
+; X32_AVX1-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
+; X32_AVX1-NEXT:    vminss LCPI0_2, %xmm0, %xmm0
----------------
zvi wrote:
> This redundant blend should be documented in Bugzilla. It would be best to fix this before committing this patch.
That blend exists because there is a vzmovl created from the inserts of 0s that pushed up to here and was then blocked by the min/max nodes. I can't pattern match it out. 

We need some sort of demanded elements filtering that figures out vcvttss2si doesn't want the upper bits and that the min/max pass the bits straight through and thus don't want the bits either. And push that all the way back to remove the original insert elements. Or something like that.

I'll file a bug, but I don' think it should block a patch that was just trying to remove an intrinsic that clang doesn't use. I could write this same test case in clang without this instrinsic and see the same extra blend.

https://reviews.llvm.org/D26660