[PATCH] D25722: Improved cost model for FDIV and FSQRT

Michael Kuperstein via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 20 11:00:57 PDT 2016


mkuper added inline comments.


================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:269
+    { ISD::FDIV,  MVT::v4f32, 14 }, // IACA value for SandyBridge arch
+    { ISD::FDIV,  MVT::v8f32, 41 }, // IACA value for SandyBridge arch
+    { ISD::FDIV,  MVT::f64,   21 }, // IACA value for SandyBridge arch
----------------
A YMM fdiv being 3 times as expensive as a XMM fdiv seems slightly odd.

I'd expect "2, maybe a bit more", and Agner seems to agree - e.g. for Sandybridge he gives a range of 10-14 cycles for XMM DIVPS, and 20-28 cycles for YMM VDIVPS. Were your IACA checks accounting for additional instructions? Or is this an inconsistency between IACA and Anger's tests?

(Note that these numbers are supposed to represent reciprocal throughput, but Agner's data for latency also has factor of ~2)



https://reviews.llvm.org/D25722





More information about the llvm-commits mailing list