[PATCH] D56011: [x86] lower extracted fadd/fsub to horizontal vector math

Fri Jan 4 07:50:46 PST 2019

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:839-842
+    // TODO: The cost of "2" for FP ops is apparently due to the fact that P4
+    // era chips were running integer units twice as fast as FP units. But these
+    // costs should be relative to other FP costs above here, so they should be
+    // "1". Alternatively, other FP costs should be scaled up by a factor of 2.
----------------
Looking harder at Agner's numbers - I didn't remember that P4 actually had 2-cycle throughput for addsd/subsd. So this is correct assuming the baseline for SSE2 is a P4 (willamette, prescott).

But the baseline chip for SSE1 below here is a P3, and that had 1-cycle throughput for addss/subss according to Agner.

I'll fix the comments.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56011/new/

https://reviews.llvm.org/D56011