[PATCH] Use rsqrt (X86) to speed up reciprocal square root calcs (PR20900)

Thu Oct 9 20:18:49 PDT 2014

----- Original Message -----
> From: "Sanjay Patel" <spatel at rotateright.com>
> To: spatel at rotateright.com, nrotem at apple.com, hfinkel at anl.gov
> Cc: steven at uplinklabs.net, llvm-dev at redking.me.uk, llvm-commits at cs.uiuc.edu
> Sent: Thursday, October 9, 2014 10:11:56 PM
> Subject: Re: [PATCH] Use rsqrt (X86) to speed up reciprocal square root calcs (PR20900)
> 
> ================
> Comment at: lib/Target/X86/X86ISelLowering.cpp:14355
> @@ +14354,3 @@
> +  // TODO: Add support for AVX (v8f32) and AVX512 (v16f32).
> +  // TODO: Is it ever worthwhile to use an estimate for f64?
> +  if (Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32))
> {
> ----------------
> hfinkel wrote:
> > Why wouldn't it be?
> A double-precision rsqrt estimate with refinement on x86 prior to FMA
> requires at least 16 instructions: convert to single, rsqrtss,
> convert back to double, refine (3 steps = at least 13 insts). I
> don't think Intel/AMD ever intended for that, or they would've added
> 'rsqrtsd' (similar to PPC's double-precision frsqrte). AFAICT, no
> x86 compiler tries to generate that sequence. Now that FMA has been
> introduced, it might be more feasible, but the HW implementations
> that have FMA also have really fast sqrt/div units, so it's again
> not worth it. Add this background to the code comment?

Yes, please. But in light of that, I'd probably not make it a "TODO", just say, "It is likely not profitable to do this for f64 because...".

 -Hal

> 
> http://reviews.llvm.org/D5658
> 
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory