[PATCH] D56011: [x86] lower extracted fadd/fsub to horizontal vector math
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 4 07:50:46 PST 2019
spatel marked an inline comment as done.
spatel added inline comments.
================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:839-842
+ // TODO: The cost of "2" for FP ops is apparently due to the fact that P4
+ // era chips were running integer units twice as fast as FP units. But these
+ // costs should be relative to other FP costs above here, so they should be
+ // "1". Alternatively, other FP costs should be scaled up by a factor of 2.
----------------
Looking harder at Agner's numbers - I didn't remember that P4 actually had 2-cycle throughput for addsd/subsd. So this is correct assuming the baseline for SSE2 is a P4 (willamette, prescott).
But the baseline chip for SSE1 below here is a P3, and that had 1-cycle throughput for addss/subss according to Agner.
I'll fix the comments.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D56011/new/
https://reviews.llvm.org/D56011
More information about the llvm-commits
mailing list