[PATCH] D32596: [DAGCombine] Transform (fmul X, -2.0) --> (fneg (fadd X, X)).

Tue May 2 15:21:21 PDT 2017

spatel added inline comments.

================
Comment at: test/CodeGen/X86/fmul-combines.ll:21-22
+; CHECK-LABEL: fmulneg2_v4f32:
+; CHECK: addps %xmm0, %xmm0
+; CHECK: xorps
+; CHECK-NEXT: retq
----------------
What we don't see in this check, but you probably know or can infer: x86 doesn't have an 'fneg' op for SSE/AVX (they ran out of transistors?). 

So we load the 128-bit sign-bit mask from memory:
  xorps	LCPI1_0(%rip), %xmm0

It's also true that the mul version would load a '2.0', but this adds an extra op, and I don't think that's good for any x86 target.

There's one other reason this may not be good: there are actually CPUs (hello, Jaguar!) that have faster FP multiplies than FP adds.

https://reviews.llvm.org/D32596