[llvm] r268539 - [AArch64] Use the reciprocal estimation machinery

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 20 11:50:46 PDT 2016


Hi Evandro -

Yes, you're absolutely correct that doing this at the DAG level allows more
folding. In fact, the codegen for x86 was recently refined some more with:
https://reviews.llvm.org/D21127

And there was another heuristic-driven DAG sqrt patch for x86 after that.

I think everyone agrees in principle that we need more machine-level
folding, but it seems to take more effort to do the work down there, so
it's not happening very much?

I may not be understanding what's going on with this patch or this thread.
Is the revert request because we want to fix/nuke TargetOptions or
something else?

Ie, it's going to take multiple steps to reach the ideal solution. I'm
assuming we need to get the recip options attached to IR somehow (FMF,
metadata) which then needs to get passed to the DAG so those folds can
continue working like they do today. Redoing/enhancing the DAG folding in
MI would be the final step, and that requires running FMF plumbing to MI
because that doesn't exist yet AFAIK.



On Tue, Sep 20, 2016 at 12:11 PM, Evandro Menezes <e.menezes at samsung.com>
wrote:

> Hi, Sanjay.
>
> My concern is that, at the moment, when the series is expanded by the DAG
> combiner, it benefits from other optimizations through the rest of its
> lifetime down to MIs.  I'm afraid that otherwise some of these
> optimizations would have to be duplicated at that level.
>
> More specifically, the series expands in a polynomial whose sums and
> multiplications are typically fused, etc along with other surrounding
> operations.
>
> Sigh...
>
> --
> Evandro Menezes
>
> On 09/20/16 12:24, Sanjay Patel wrote:
>
>> I knew reciprocal estimates were going to come back to haunt me. :)
>>
>> Trying to page some of these patches back into my brain...at some point
>> last year, it became clear that sticking a Recip struct into TargetOptions
>> (like the FPOpFusion enum that preceded it) was a horrible solution because
>> it completely failed with LTO.
>>
>> This led to:
>> https://reviews.llvm.org/D14707 <https://reviews.llvm.org/D14707>
>>
>> ...which may be controversial itself:
>> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-
>> 20160104/323154.html
>>
>> I don't have a good understanding of the design trade-offs, but here's my
>> ideal world recip functionality: allow complete specification of any recip
>> op at the instruction level (because some customers want to control this
>> via source code pragmas).
>>
>> So that's IR-level FastMathFlags, but it needs to also include a field
>> for the refinement steps for a rsqrt or recip. And then that info needs to
>> propagate from IR to the DAG and eventually MachineInst...because that's
>> where we will actually use it.
>>
>> If we also add an AllowFMA field to FastMathFlags, then we can get rid of
>> both of those warts (AllowFPOpFusion, Reciprocals) in TargetOptions?
>>
>>
>>
>> On Tue, Sep 20, 2016 at 10:04 AM, Eric Christopher <echristo at gmail.com
>> <mailto:echristo at gmail.com>> wrote:
>>
>>
>>
>>     On Tue, Sep 20, 2016 at 8:48 AM Evandro Menezes
>>     <e.menezes at samsung.com <mailto:e.menezes at samsung.com>> wrote:
>>
>>         Eric,
>>
>>         Will do.  Since it's a few months old, it'll take me a couple
>>         of hours
>>         or so to revert it.
>>
>>
>>     No worries at all. I hear rumor that Hal is on vacation so maybe
>>     we can come up with a solution by the time he can come back and
>>     deal with the power version :)
>>
>>         I'd appreciate any further information to point me in the
>>         right direction.
>>
>>
>>     Of course. The rough idea is that the TargetMachine holds things
>>     that only cover tuning/code generation for an entire module -
>>     since in a merged module you've got functions coming in with
>>     possibly different options then you need to come up with a
>>     function based way to do the tuning.
>>
>>     One thought is using TT, we'll probably want to pull the
>>     reciprocal implementation off of TargetOptions since it can vary
>>     per subtarget and instead call through TTI during optimization to
>>     figure out what we can do and TTI is subtarget dependent.
>>
>>     Thoughts? Sanjay?
>>
>>     -eric
>>
>>         Thank you,
>>
>>         --
>>         Evandro Menezes
>>
>>         On 09/20/16 10:32, Eric Christopher wrote:
>>         >
>>         >
>>         > On Tue, Sep 20, 2016 at 5:15 AM Evandro Menezes
>>         <e.menezes at samsung.com <mailto:e.menezes at samsung.com>
>>         > <mailto:e.menezes at samsung.com
>>
>>         <mailto:e.menezes at samsung.com>>> wrote:
>>         >
>>         >     Hi, Eric.
>>         >
>>         >     All the other targets which use the Newton series
>>         approximations
>>         >     include
>>         >     a Subtarget in TargetMatchine.  This change merely
>>         follows the pattern
>>         >     already found in the PowerPC and X86 ports.
>>         >
>>         >
>>         > This patch is definitely beyond the level of post-commit
>>         review for
>>         > areas you've typically touched and I'm asking you to revert
>>         it. In
>>         > addition, I'll ask Hal to revert the patch in r241985 while
>>         we come up
>>         > with a better implementation.
>>         >
>>         >     Thus, if anything, the discussion should be broader than
>>         AArch64.
>>         >     Until
>>         >     that's settled, it's preferable to keep this patch until
>>         an universal
>>         >     solution is found, IMO.
>>         >
>>         >
>>         > This discussion has already happened when I did the work in
>>         the first
>>         > place to deal with function based subtargets and LTO. In
>>         addition, the
>>         > work that you've done has had a flaw explicitly pointed out
>>         - you'll
>>         > have different values that won't get reset between functions.
>>         >
>>         > All subtarget dependent machinery is (unless the target
>>         really doesn't
>>         > care like most of the gpus) based on the function and you
>>         can't depend
>>         > upon a single subtarget for any TargetMachine. This means
>>         that, in
>>         > general, you also can't depend upon it for initializing the pass
>>         > structure or overall backend machinery.
>>         >
>>         > As I mentioned before, during LTO if you compile two
>>         functions with
>>         > different cpu options and link them together you should get
>>         different
>>         > code and tuning, in this case you've now broken it for the
>>         reciprocal
>>         > code.
>>         >
>>         > -eric
>>         >
>>         >     Thank you,
>>         >
>>         >     --
>>         >     Evandro Menezes
>>         >
>>         >     On 09/19/16 19:49, Eric Christopher wrote:
>>         >     > Hi Evandro,
>>         >     >
>>         >     > I'm going to ask you to revert this patch please.
>>         There are a
>>         >     few reasons:
>>         >     >
>>         >     > a) I don't see a review thread here? This changes the
>>         port quite
>>         >     a bit.
>>         >     >
>>         >     >
>>         >     >      ///
>>         >     >      AArch64TargetMachine::AArch64TargetMachine(const
>>         Target &T,
>>         >     const
>>         >     >     Triple &TT,
>>         >     >     @@ -149,7 +173,8 @@
>>         AArch64TargetMachine::AArch64TargetMachi
>>         >     >          : LLVMTargetMachine(T, computeDataLayout(TT,
>>         LittleEndian),
>>         >     >     TT, CPU, FS,
>>         >     > Options, RM, CM, OL),
>>         >     >            TLOF(createTLOF(getTargetTriple())),
>>         >     >     -      isLittle(LittleEndian) {
>>         >     >     +      Subtarget(TT, CPU, FS, *this, LittleEndian) {
>>         >     >     +  initReciprocals(*this, Subtarget);
>>         >     >        initAsmInfo();
>>         >     >      }
>>         >     >
>>         >     >     @@ -189,7 +214,7 @@
>>         AArch64TargetMachine::getSubtargetImpl(c
>>         >     >          // function that reside in TargetOptions.
>>         >     >          resetTargetOptions(F);
>>         >     >          I =
>>         llvm::make_unique<AArch64Subtarget>(TargetTriple, CPU,
>>         >     >     FS, *this,
>>         >     >     - isLittle);
>>         >     >     + Subtarget.isLittleEndian());
>>         >     >      #ifndef LLVM_BUILD_GLOBAL_ISEL
>>         >     >         GISelAccessor *GISel = new GISelAccessor();
>>         >     >      #else
>>         >     >
>>         >     >     Modified:
>>         llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h
>>         >     >     URL:
>>         >     >
>>         >
>>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AA
>> rch64/AArch64TargetMachine.h?rev=268539&r1=268538&r2=268539&view=diff
>>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/A
>> Arch64/AArch64TargetMachine.h?rev=268539&r1=268538&r2=268539&view=diff>
>>         >     >
>>         >             =============================
>> =================================================
>>         >     >     ---
>>         llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h
>>         >     (original)
>>         >     >     +++
>>         llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h Wed May
>>         >     >     4 15:18:27 2016
>>         >     >     @@ -46,7 +46,7 @@ public:
>>         >     >        }
>>         >     >
>>         >     >      private:
>>         >     >     -  bool isLittle;
>>         >     >     +  AArch64Subtarget Subtarget;
>>         >     >      };
>>         >     >
>>         >     >
>>         >     > These changes aren't correct. There is no need and
>>         it's currently
>>         >     > invalid for the AArch64 port to depend on a single
>>         global subtarget
>>         >     > for all configuration. This is going to break anything
>>         that
>>         >     depends on
>>         >     > changing subtargets on a function by function basis
>>         (including
>>         >     > different -mcpu flags in different translation units
>>         or different
>>         >     > options).
>>         >     >
>>         >     > You're going to need to do all of these changes in a
>>         different
>>         >     way and
>>         >     > I haven't had time to look to see what's going on.
>>         >     >
>>         >     > Thanks!
>>         >     >
>>         >     > -eric
>>         >     >
>>         >     >
>>         >     >      // AArch64leTargetMachine - AArch64 little endian
>>         target
>>         >     machine.
>>         >     >
>>         >     >     Added:
>>         llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll
>>         >     >     URL:
>>         >     >
>>         >
>>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> AArch64/recp-fastmath.ll?rev=268539&view=auto
>>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>> /AArch64/recp-fastmath.ll?rev=268539&view=auto>
>>         >     >
>>         >             =============================
>> =================================================
>>         >     >     ---
>>         llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll (added)
>>         >     >     +++
>>         llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll Wed May  4
>>         >     >     15:18:27 2016
>>         >     >     @@ -0,0 +1,79 @@
>>         >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
>>         >     -recip=!div,!vec-div
>>         >     >     | FileCheck %s --check-prefix=FAULT
>>         >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
>>         -recip=div,vec-div
>>         >     >      | FileCheck %s
>>         >     >     +
>>         >     >     +define float @frecp(float %x) #0 {
>>         >     >     +  %div = fdiv fast float 1.0, %x
>>         >     >     +  ret float %div
>>         >     >     +
>>         >     >     +; FAULT-LABEL: frecp:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fmov
>>         >     >     +; FAULT-NEXT: fdiv
>>         >     >     +
>>         >     >     +; CHECK-LABEL: frecp:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: frecpe
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <2 x float> @f2recp(<2 x float> %x) #0 {
>>         >     >     +  %div = fdiv fast <2 x float> <float 1.0, float
>>         1.0>, %x
>>         >     >     +  ret <2 x float> %div
>>         >     >     +
>>         >     >     +; FAULT-LABEL: f2recp:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fmov
>>         >     >     +; FAULT-NEXT: fdiv
>>         >     >     +
>>         >     >     +; CHECK-LABEL: f2recp:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frecpe
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <4 x float> @f4recp(<4 x float> %x) #0 {
>>         >     >     +  %div = fdiv fast <4 x float> <float 1.0, float
>>         1.0, float
>>         >     1.0,
>>         >     >     float 1.0>, %x
>>         >     >     +  ret <4 x float> %div
>>         >     >     +
>>         >     >     +; FAULT-LABEL: f4recp:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fmov
>>         >     >     +; FAULT-NEXT: fdiv
>>         >     >     +
>>         >     >     +; CHECK-LABEL: f4recp:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frecpe
>>         >     >     +}
>>         >     >     +
>>         >     >     +define double @drecp(double %x) #0 {
>>         >     >     +  %div = fdiv fast double 1.0, %x
>>         >     >     +  ret double %div
>>         >     >     +
>>         >     >     +; FAULT-LABEL: drecp:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fmov
>>         >     >     +; FAULT-NEXT: fdiv
>>         >     >     +
>>         >     >     +; CHECK-LABEL: drecp:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: frecpe
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <2 x double> @d2recp(<2 x double> %x) #0 {
>>         >     >     +  %div = fdiv fast <2 x double> <double 1.0,
>>         double 1.0>, %x
>>         >     >     +  ret <2 x double> %div
>>         >     >     +
>>         >     >     +; FAULT-LABEL: d2recp:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fmov
>>         >     >     +; FAULT-NEXT: fdiv
>>         >     >     +
>>         >     >     +; CHECK-LABEL: d2recp:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frecpe
>>         >     >     +}
>>         >     >     +
>>         >     >     +attributes #0 = { nounwind "unsafe-fp-math"="true" }
>>         >     >
>>         >     >     Added:
>>         llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll
>>         >     >     URL:
>>         >     >
>>         >
>>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> AArch64/sqrt-fastmath.ll?rev=268539&view=auto
>>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen
>> /AArch64/sqrt-fastmath.ll?rev=268539&view=auto>
>>         >     >
>>         >             =============================
>> =================================================
>>         >     >     ---
>>         llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll (added)
>>         >     >     +++
>>         llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll Wed May  4
>>         >     >     15:18:27 2016
>>         >     >     @@ -0,0 +1,158 @@
>>         >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
>>         >     >     -recip=!sqrt,!vec-sqrt | FileCheck %s
>>         --check-prefix=FAULT
>>         >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
>>         >     >     -recip=sqrt,vec-sqrt   | FileCheck %s
>>         >     >     +
>>         >     >     +declare float @llvm.sqrt.f32(float) #1
>>         >     >     +declare double @llvm.sqrt.f64(double) #1
>>         >     >     +declare <2 x float> @llvm.sqrt.v2f32(<2 x float>) #1
>>         >     >     +declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #1
>>         >     >     +declare <2 x double> @llvm.sqrt.v2f64(<2 x
>>         double>) #1
>>         >     >     +
>>         >     >     +define float @fsqrt(float %a) #0 {
>>         >     >     +  %1 = tail call fast float @llvm.sqrt.f32(float %a)
>>         >     >     +  ret float %1
>>         >     >     +
>>         >     >     +; FAULT-LABEL: fsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: fsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <2 x float> @f2sqrt(<2 x float> %a) #0 {
>>         >     >     +  %1 = tail call fast <2 x float>
>>         @llvm.sqrt.v2f32(<2 x
>>         >     float> %a) #2
>>         >     >     +  ret <2 x float> %1
>>         >     >     +
>>         >     >     +; FAULT-LABEL: f2sqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: f2sqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: mov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <4 x float> @f4sqrt(<4 x float> %a) #0 {
>>         >     >     +  %1 = tail call fast <4 x float>
>>         @llvm.sqrt.v4f32(<4 x
>>         >     float> %a) #2
>>         >     >     +  ret <4 x float> %1
>>         >     >     +
>>         >     >     +; FAULT-LABEL: f4sqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: f4sqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: mov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define double @dsqrt(double %a) #0 {
>>         >     >     +  %1 = tail call fast double
>>         @llvm.sqrt.f64(double %a)
>>         >     >     +  ret double %1
>>         >     >     +
>>         >     >     +; FAULT-LABEL: dsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: dsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <2 x double> @d2sqrt(<2 x double> %a) #0 {
>>         >     >     +  %1 = tail call fast <2 x double>
>>         @llvm.sqrt.v2f64(<2 x
>>         >     double>
>>         >     >     %a) #2
>>         >     >     +  ret <2 x double> %1
>>         >     >     +
>>         >     >     +; FAULT-LABEL: d2sqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: d2sqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: mov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define float @frsqrt(float %a) #0 {
>>         >     >     +  %1 = tail call fast float @llvm.sqrt.f32(float %a)
>>         >     >     +  %2 = fdiv fast float 1.000000e+00, %1
>>         >     >     +  ret float %2
>>         >     >     +
>>         >     >     +; FAULT-LABEL: frsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: frsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <2 x float> @f2rsqrt(<2 x float> %a) #0 {
>>         >     >     +  %1 = tail call fast <2 x float>
>>         @llvm.sqrt.v2f32(<2 x
>>         >     float> %a) #2
>>         >     >     +  %2 = fdiv fast <2 x float> <float 1.000000e+00,
>>         float
>>         >     >     1.000000e+00>, %1
>>         >     >     +  ret <2 x float> %2
>>         >     >     +
>>         >     >     +; FAULT-LABEL: f2rsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: f2rsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <4 x float> @f4rsqrt(<4 x float> %a) #0 {
>>         >     >     +  %1 = tail call fast <4 x float>
>>         @llvm.sqrt.v4f32(<4 x
>>         >     float> %a) #2
>>         >     >     +  %2 = fdiv fast <4 x float> <float 1.000000e+00,
>>         float
>>         >     >     1.000000e+00, float 1.000000e+00, float
>>         1.000000e+00>, %1
>>         >     >     +  ret <4 x float> %2
>>         >     >     +
>>         >     >     +; FAULT-LABEL: f4rsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: f4rsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define double @drsqrt(double %a) #0 {
>>         >     >     +  %1 = tail call fast double
>>         @llvm.sqrt.f64(double %a)
>>         >     >     +  %2 = fdiv fast double 1.000000e+00, %1
>>         >     >     +  ret double %2
>>         >     >     +
>>         >     >     +; FAULT-LABEL: drsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: drsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +define <2 x double> @d2rsqrt(<2 x double> %a) #0 {
>>         >     >     +  %1 = tail call fast <2 x double>
>>         @llvm.sqrt.v2f64(<2 x
>>         >     double>
>>         >     >     %a) #2
>>         >     >     +  %2 = fdiv fast <2 x double> <double
>>         1.000000e+00, double
>>         >     >     1.000000e+00>, %1
>>         >     >     +  ret <2 x double> %2
>>         >     >     +
>>         >     >     +; FAULT-LABEL: d2rsqrt:
>>         >     >     +; FAULT-NEXT: BB#0
>>         >     >     +; FAULT-NEXT: fsqrt
>>         >     >     +
>>         >     >     +; CHECK-LABEL: d2rsqrt:
>>         >     >     +; CHECK-NEXT: BB#0
>>         >     >     +; CHECK-NEXT: fmov
>>         >     >     +; CHECK-NEXT: frsqrte
>>         >     >     +}
>>         >     >     +
>>         >     >     +attributes #0 = { nounwind "unsafe-fp-math"="true" }
>>         >     >
>>         >     >
>>         >     >     _______________________________________________
>>         >     >     llvm-commits mailing list
>>         >     > llvm-commits at lists.llvm.org
>>         <mailto:llvm-commits at lists.llvm.org>
>>         <mailto:llvm-commits at lists.llvm.org
>>         <mailto:llvm-commits at lists.llvm.org>>
>>         >     <mailto:llvm-commits at lists.llvm.org
>>         <mailto:llvm-commits at lists.llvm.org>
>>         >     <mailto:llvm-commits at lists.llvm.org
>>         <mailto:llvm-commits at lists.llvm.org>>>
>>         >     >
>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>         >     >
>>         >
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160920/728275f7/attachment.html>


More information about the llvm-commits mailing list