<div dir="ltr"><div>Hi Manman,<br><br></div>Thanks for the notification. Yes, several bots were failing due to this checkin. I think it should be corrected with r221731.<br><div class="gmail_extra"><br></div><div class="gmail_extra">Sanjay<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 11, 2014 at 5:03 PM, Manman Ren <span dir="ltr"><<a href="mailto:mren@apple.com" target="_blank">mren@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Sanjay,<br>

<br>

The public bot is failing: <a href="http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/988/consoleFull#18488811128254eaf0-7326-4999-85b0-388101f2d404

/Users/buildslave/jenkins/sharedspace/incremental@2/llvm/test/CodeGen/X86/sqrt-fastmath.ll:73:15" target="_blank">http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/988/consoleFull#18488811128254eaf0-7326-4999-85b0-388101f2d404<br>

/Users/buildslave/jenkins/sharedspace/incremental@2/llvm/test/CodeGen/X86/sqrt-fastmath.ll:73:15</a>: error: CHECK-NEXT: is not on the line after the previous match<br>

<br>

; CHECK-NEXT: movss<br>

              ^<br>

<stdin>:54:2: note: 'next' match was here<br>

 movss .LCPI3_0(%rip), %xmm0<br>

 ^<br>

<stdin>:51:8: note: previous match ended here<br>

 sqrtss %xmm0, %xmm1<br>

       ^<br>

<stdin>:52:1: note: non-matching line after previous match is here<br>

 rcpss %xmm1, %xmm2<br>

<br>

Could you check if it is due to your change?<br>

<br>

Thanks,<br>

Manman<br>

<div class="HOEnZb"><div class="h5"><br>

> On Nov 11, 2014, at 12:51 PM, Sanjay Patel <<a href="mailto:spatel@rotateright.com">spatel@rotateright.com</a>> wrote:<br>

><br>

> Author: spatel<br>

> Date: Tue Nov 11 14:51:00 2014<br>

> New Revision: 221706<br>

><br>

> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=221706&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=221706&view=rev</a><br>

> Log:<br>

> Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).<br>

><br>

> This is a first step for generating SSE rcp instructions for reciprocal<br>

> calcs when fast-math allows it. This is very similar to the rsqrt optimization<br>

> enabled in D5658 ( <a href="http://reviews.llvm.org/rL220570" target="_blank">http://reviews.llvm.org/rL220570</a> ).<br>

><br>

> For now, be conservative and only enable this for AMD btver2 where performance<br>

> improves significantly both in terms of latency and throughput.<br>

><br>

> We may never enable this codegen for Intel Core* chips because the divider circuits<br>

> are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21<br>

> cycle critical path for the rcp + mul + sub + mul + add estimate.<br>

><br>

> Follow-on patches may allow configuration of the number of Newton-Raphson refinement<br>

> steps, add AVX512 support, and enable the optimization for more chips.<br>

><br>

> More background here: <a href="http://llvm.org/bugs/show_bug.cgi?id=21385" target="_blank">http://llvm.org/bugs/show_bug.cgi?id=21385</a><br>

><br>

> Differential Revision: <a href="http://reviews.llvm.org/D6175" target="_blank">http://reviews.llvm.org/D6175</a><br>

><br>

><br>

> Added:<br>

>    llvm/trunk/test/CodeGen/X86/recip-fastmath.ll<br>

> Modified:<br>

>    llvm/trunk/lib/Target/X86/X86.td<br>

>    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

>    llvm/trunk/lib/Target/X86/X86ISelLowering.h<br>

>    llvm/trunk/lib/Target/X86/X86Subtarget.h<br>

><br>

> Modified: llvm/trunk/lib/Target/X86/X86.td<br>

> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.td?rev=221706&r1=221705&r2=221706&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86.td?rev=221706&r1=221705&r2=221706&view=diff</a><br>

> ==============================================================================<br>

> --- llvm/trunk/lib/Target/X86/X86.td (original)<br>

> +++ llvm/trunk/lib/Target/X86/X86.td Tue Nov 11 14:51:00 2014<br>

> @@ -184,6 +184,8 @@ def FeatureSlowIncDec : SubtargetFeature<br>

>                                    "INC and DEC instructions are slower than ADD and SUB">;<br>

> def FeatureUseSqrtEst : SubtargetFeature<"use-sqrt-est", "UseSqrtEst", "true",<br>

>                             "Use RSQRT* to optimize square root calculations">;<br>

> +def FeatureUseRecipEst : SubtargetFeature<"use-recip-est", "UseReciprocalEst",<br>

> +                          "true", "Use RCP* to optimize division calculations">;<br>

><br>

> //===----------------------------------------------------------------------===//<br>

> // X86 processors supported.<br>

> @@ -350,7 +352,7 @@ def : ProcessorModel<"btver2", BtVer2Mod<br>

>                       FeaturePRFCHW, FeatureAES, FeaturePCLMUL,<br>

>                       FeatureBMI, FeatureF16C, FeatureMOVBE,<br>

>                       FeatureLZCNT, FeaturePOPCNT, FeatureSlowSHLD,<br>

> -                      FeatureUseSqrtEst]>;<br>

> +                      FeatureUseSqrtEst, FeatureUseRecipEst]>;<br>

><br>

> // Bulldozer<br>

> def : Proc<"bdver1",          [FeatureXOP, FeatureFMA4, FeatureCMPXCHG16B,<br>

><br>

> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=221706&r1=221705&r2=221706&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=221706&r1=221705&r2=221706&view=diff</a><br>

> ==============================================================================<br>

> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)<br>

> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Nov 11 14:51:00 2014<br>

> @@ -14514,6 +14514,37 @@ SDValue X86TargetLowering::getRsqrtEstim<br>

>   return SDValue();<br>

> }<br>

><br>

> +/// The minimum architected relative accuracy is 2^-12. We need one<br>

> +/// Newton-Raphson step to have a good float result (24 bits of precision).<br>

> +SDValue X86TargetLowering::getRecipEstimate(SDValue Op,<br>

> +                                            DAGCombinerInfo &DCI,<br>

> +                                            unsigned &RefinementSteps) const {<br>

> +  // FIXME: We should use instruction latency models to calculate the cost of<br>

> +  // each potential sequence, but this is very hard to do reliably because<br>

> +  // at least Intel's Core* chips have variable timing based on the number of<br>

> +  // significant digits in the divisor.<br>

> +  if (!Subtarget->useReciprocalEst())<br>

> +    return SDValue();<br>

> +<br>

> +  EVT VT = Op.getValueType();<br>

> +<br>

> +  // SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.<br>

> +  // TODO: Add support for AVX512 (v16f32).<br>

> +  // It is likely not profitable to do this for f64 because a double-precision<br>

> +  // reciprocal estimate with refinement on x86 prior to FMA requires<br>

> +  // 15 instructions: convert to single, rcpss, convert back to double, refine<br>

> +  // (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA<br>

> +  // along with FMA, this could be a throughput win.<br>

> +  if ((Subtarget->hasSSE1() && (VT == MVT::f32 || VT == MVT::v4f32)) ||<br>

> +      (Subtarget->hasAVX() && VT == MVT::v8f32)) {<br>

> +    // TODO: Expose this as a user-configurable parameter to allow for<br>

> +    // speed vs. accuracy flexibility.<br>

> +    RefinementSteps = 1;<br>

> +    return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);<br>

> +  }<br>

> +  return SDValue();<br>

> +}<br>

> +<br>

> static bool isAllOnes(SDValue V) {<br>

>   ConstantSDNode *C = dyn_cast<ConstantSDNode>(V);<br>

>   return C && C->isAllOnesValue();<br>

><br>

> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h<br>

> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=221706&r1=221705&r2=221706&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=221706&r1=221705&r2=221706&view=diff</a><br>

> ==============================================================================<br>

> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)<br>

> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Tue Nov 11 14:51:00 2014<br>

> @@ -1031,6 +1031,10 @@ namespace llvm {<br>

>     SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,<br>

>                              unsigned &RefinementSteps,<br>

>                              bool &UseOneConstNR) const override;<br>

> +<br>

> +    /// Use rcp* to speed up fdiv calculations.<br>

> +    SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,<br>

> +                             unsigned &RefinementSteps) const override;<br>

>   };<br>

><br>

>   namespace X86 {<br>

><br>

> Modified: llvm/trunk/lib/Target/X86/X86Subtarget.h<br>

> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.h?rev=221706&r1=221705&r2=221706&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86Subtarget.h?rev=221706&r1=221705&r2=221706&view=diff</a><br>

> ==============================================================================<br>

> --- llvm/trunk/lib/Target/X86/X86Subtarget.h (original)<br>

> +++ llvm/trunk/lib/Target/X86/X86Subtarget.h Tue Nov 11 14:51:00 2014<br>

> @@ -197,6 +197,11 @@ protected:<br>

>   /// substantially higher than normal FP ops like FADD and FMUL.<br>

>   bool UseSqrtEst;<br>

><br>

> +  /// Use the RCP* instructions to optimize FP division calculations.<br>

> +  /// For this to be profitable, the cost of FDIV must be<br>

> +  /// substantially higher than normal FP ops like FADD and FMUL.<br>

> +  bool UseReciprocalEst;<br>

> +<br>

>   /// Processor has AVX-512 PreFetch Instructions<br>

>   bool HasPFI;<br>

><br>

> @@ -375,6 +380,7 @@ public:<br>

>   bool slowLEA() const { return SlowLEA; }<br>

>   bool slowIncDec() const { return SlowIncDec; }<br>

>   bool useSqrtEst() const { return UseSqrtEst; }<br>

> +  bool useReciprocalEst() const { return UseReciprocalEst; }<br>

>   bool hasCDI() const { return HasCDI; }<br>

>   bool hasPFI() const { return HasPFI; }<br>

>   bool hasERI() const { return HasERI; }<br>

><br>

> Added: llvm/trunk/test/CodeGen/X86/recip-fastmath.ll<br>

> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/recip-fastmath.ll?rev=221706&view=auto" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/recip-fastmath.ll?rev=221706&view=auto</a><br>

> ==============================================================================<br>

> --- llvm/trunk/test/CodeGen/X86/recip-fastmath.ll (added)<br>

> +++ llvm/trunk/test/CodeGen/X86/recip-fastmath.ll Tue Nov 11 14:51:00 2014<br>

> @@ -0,0 +1,72 @@<br>

> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=core2 | FileCheck %s<br>

> +; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=btver2 | FileCheck %s --check-prefix=BTVER2<br>

> +<br>

> +; If the target's divss/divps instructions are substantially<br>

> +; slower than rcpss/rcpps with a Newton-Raphson refinement,<br>

> +; we should generate the estimate sequence.<br>

> +<br>

> +; See PR21385 ( <a href="http://llvm.org/bugs/show_bug.cgi?id=21385" target="_blank">http://llvm.org/bugs/show_bug.cgi?id=21385</a> )<br>

> +; for details about the accuracy, speed, and implementation<br>

> +; differences of x86 reciprocal estimates.<br>

> +<br>

> +define float @reciprocal_estimate(float %x) #0 {<br>

> +  %div = fdiv fast float 1.0, %x<br>

> +  ret float %div<br>

> +<br>

> +; CHECK-LABEL: reciprocal_estimate:<br>

> +; CHECK: movss<br>

> +; CHECK-NEXT: divss<br>

> +; CHECK-NEXT: movaps<br>

> +; CHECK-NEXT: retq<br>

> +<br>

> +; BTVER2-LABEL: reciprocal_estimate:<br>

> +; BTVER2: vrcpss<br>

> +; BTVER2-NEXT: vmulss<br>

> +; BTVER2-NEXT: vsubss<br>

> +; BTVER2-NEXT: vmulss<br>

> +; BTVER2-NEXT: vaddss<br>

> +; BTVER2-NEXT: retq<br>

> +}<br>

> +<br>

> +define <4 x float> @reciprocal_estimate_v4f32(<4 x float> %x) #0 {<br>

> +  %div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x<br>

> +  ret <4 x float> %div<br>

> +<br>

> +; CHECK-LABEL: reciprocal_estimate_v4f32:<br>

> +; CHECK: movaps<br>

> +; CHECK-NEXT: divps<br>

> +; CHECK-NEXT: movaps<br>

> +; CHECK-NEXT: retq<br>

> +<br>

> +; BTVER2-LABEL: reciprocal_estimate_v4f32:<br>

> +; BTVER2: vrcpps<br>

> +; BTVER2-NEXT: vmulps<br>

> +; BTVER2-NEXT: vsubps<br>

> +; BTVER2-NEXT: vmulps<br>

> +; BTVER2-NEXT: vaddps<br>

> +; BTVER2-NEXT: retq<br>

> +}<br>

> +<br>

> +define <8 x float> @reciprocal_estimate_v8f32(<8 x float> %x) #0 {<br>

> +  %div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x<br>

> +  ret <8 x float> %div<br>

> +<br>

> +; CHECK-LABEL: reciprocal_estimate_v8f32:<br>

> +; CHECK: movaps<br>

> +; CHECK: movaps<br>

> +; CHECK-NEXT: divps<br>

> +; CHECK-NEXT: divps<br>

> +; CHECK-NEXT: movaps<br>

> +; CHECK-NEXT: movaps<br>

> +; CHECK-NEXT: retq<br>

> +<br>

> +; BTVER2-LABEL: reciprocal_estimate_v8f32:<br>

> +; BTVER2: vrcpps<br>

> +; BTVER2-NEXT: vmulps<br>

> +; BTVER2-NEXT: vsubps<br>

> +; BTVER2-NEXT: vmulps<br>

> +; BTVER2-NEXT: vaddps<br>

> +; BTVER2-NEXT: retq<br>

> +}<br>

> +<br>

> +attributes #0 = { "unsafe-fp-math"="true" }<br>

><br>

><br>

> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br>

</div></div></blockquote></div><br></div></div>