[PATCH] D26660: [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul

Tue Nov 15 00:13:14 PST 2016

zvi added inline comments.

================
Comment at: test/CodeGen/X86/vec_ss_load_fold.ll:41
+; X32_AVX1-NEXT:    vmulss LCPI0_1, %xmm0, %xmm0
+; X32_AVX1-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
+; X32_AVX1-NEXT:    vminss LCPI0_2, %xmm0, %xmm0
----------------
craig.topper wrote:
> zvi wrote:
> > This redundant blend should be documented in Bugzilla. It would be best to fix this before committing this patch.
> That blend exists because there is a vzmovl created from the inserts of 0s that pushed up to here and was then blocked by the min/max nodes. I can't pattern match it out. 
> 
> We need some sort of demanded elements filtering that figures out vcvttss2si doesn't want the upper bits and that the min/max pass the bits straight through and thus don't want the bits either. And push that all the way back to remove the original insert elements. Or something like that.
> 
> I'll file a bug, but I don' think it should block a patch that was just trying to remove an intrinsic that clang doesn't use. I could write this same test case in clang without this instrinsic and see the same extra blend.
Thanks

https://reviews.llvm.org/D26660