[PATCH] Use rsqrt (X86) to speed up reciprocal square root calcs (PR20900)
Sanjay Patel
spatel at rotateright.com
Thu Oct 9 20:11:56 PDT 2014
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:14355
@@ +14354,3 @@
+ // TODO: Add support for AVX (v8f32) and AVX512 (v16f32).
+ // TODO: Is it ever worthwhile to use an estimate for f64?
+ if (Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32)) {
----------------
hfinkel wrote:
> Why wouldn't it be?
A double-precision rsqrt estimate with refinement on x86 prior to FMA requires at least 16 instructions: convert to single, rsqrtss, convert back to double, refine (3 steps = at least 13 insts). I don't think Intel/AMD ever intended for that, or they would've added 'rsqrtsd' (similar to PPC's double-precision frsqrte). AFAICT, no x86 compiler tries to generate that sequence. Now that FMA has been introduced, it might be more feasible, but the HW implementations that have FMA also have really fast sqrt/div units, so it's again not worth it. Add this background to the code comment?
http://reviews.llvm.org/D5658
More information about the llvm-commits
mailing list