[PATCH] Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).

Fri Sep 26 13:37:50 PDT 2014

================
Comment at: include/llvm/Target/TargetLowering.h:2632
@@ +2631,3 @@
+  /// returned by one of the above methods.
+  virtual unsigned getNRSteps(EVT VT) const {
+    return 0;
----------------
hfinkel wrote:
> spatel wrote:
> > hfinkel wrote:
> > > The number of iterations necessary for the reciprocal estimate and for the reciprocal sqrt estimate might be different. Please provide a way to differentiate (and I'd want to make really sure the target actually overrides this). Maybe:
> > > 
> > >   virtual unsigned getNRSteps(EVT VT, bool SqrtEst) const {
> > >     llvm_unreachable("Target must provide the number of iterations");
> > >   }
> > > 
> > Sure - I'll make unique functions to return iteration counts for sqrte and rcpe. 
> > 
> > We may need one more refinement here regarding the rcpe(rsqrt(x)) transformation of a regular sqrt(x)...my guess is that's not a win on any recent X86 (and probably not PPC either?). But that change can come later if needed.
> Regarding PPC, you might be right about some of them -- it is certainly a win on the embedded cores where the sqrt instruction is not fully pipelined. We'll need to do some measurements.
It's coming back to me now (used to be at IBM and Apple)...
I think the deciding factor is not whether the sqrt instruction is pipelined, but whether it exists at all. Eg, 7400/7450 had fre/frsqrte, but lacked fsqrt. In that case, the decision is between doing a long sequence of dependent ops using the estimates vs. making a call to libm sqrt(). If fsqrt exists, it should probably be used unless there's some truly horrible HW implementation out there.
Certainly, this should be measured on as many targets as possible to see if it's true.

http://reviews.llvm.org/D5484