[llvm] r268539 - [AArch64] Use the reciprocal estimation machinery

Tue Sep 20 09:04:51 PDT 2016

On Tue, Sep 20, 2016 at 8:48 AM Evandro Menezes <e.menezes at samsung.com>
wrote:

> Eric,
>
> Will do.  Since it's a few months old, it'll take me a couple of hours
> or so to revert it.
>
>
No worries at all. I hear rumor that Hal is on vacation so maybe we can
come up with a solution by the time he can come back and deal with the
power version :)

> I'd appreciate any further information to point me in the right direction.
>
>
Of course. The rough idea is that the TargetMachine holds things that only
cover tuning/code generation for an entire module - since in a merged
module you've got functions coming in with possibly different options then
you need to come up with a function based way to do the tuning.

One thought is using TT, we'll probably want to pull the reciprocal
implementation off of TargetOptions since it can vary per subtarget and
instead call through TTI during optimization to figure out what we can do
and TTI is subtarget dependent.

Thoughts? Sanjay?

-eric

> Thank you,
>
> --
> Evandro Menezes
>
> On 09/20/16 10:32, Eric Christopher wrote:
> >
> >
> > On Tue, Sep 20, 2016 at 5:15 AM Evandro Menezes <e.menezes at samsung.com
> > <mailto:e.menezes at samsung.com>> wrote:
> >
> >     Hi, Eric.
> >
> >     All the other targets which use the Newton series approximations
> >     include
> >     a Subtarget in TargetMatchine.  This change merely follows the
> pattern
> >     already found in the PowerPC and X86 ports.
> >
> >
> > This patch is definitely beyond the level of post-commit review for
> > areas you've typically touched and I'm asking you to revert it. In
> > addition, I'll ask Hal to revert the patch in r241985 while we come up
> > with a better implementation.
> >
> >     Thus, if anything, the discussion should be broader than AArch64.
> >     Until
> >     that's settled, it's preferable to keep this patch until an universal
> >     solution is found, IMO.
> >
> >
> > This discussion has already happened when I did the work in the first
> > place to deal with function based subtargets and LTO. In addition, the
> > work that you've done has had a flaw explicitly pointed out - you'll
> > have different values that won't get reset between functions.
> >
> > All subtarget dependent machinery is (unless the target really doesn't
> > care like most of the gpus) based on the function and you can't depend
> > upon a single subtarget for any TargetMachine. This means that, in
> > general, you also can't depend upon it for initializing the pass
> > structure or overall backend machinery.
> >
> > As I mentioned before, during LTO if you compile two functions with
> > different cpu options and link them together you should get different
> > code and tuning, in this case you've now broken it for the reciprocal
> > code.
> >
> > -eric
> >
> >     Thank you,
> >
> >     --
> >     Evandro Menezes
> >
> >     On 09/19/16 19:49, Eric Christopher wrote:
> >     > Hi Evandro,
> >     >
> >     > I'm going to ask you to revert this patch please. There are a
> >     few reasons:
> >     >
> >     > a) I don't see a review thread here? This changes the port quite
> >     a bit.
> >     >
> >     >
> >     >      ///
> >     >      AArch64TargetMachine::AArch64TargetMachine(const Target &T,
> >     const
> >     >     Triple &TT,
> >     >     @@ -149,7 +173,8 @@ AArch64TargetMachine::AArch64TargetMachi
> >     >          : LLVMTargetMachine(T, computeDataLayout(TT,
> LittleEndian),
> >     >     TT, CPU, FS,
> >     >                              Options, RM, CM, OL),
> >     >            TLOF(createTLOF(getTargetTriple())),
> >     >     -      isLittle(LittleEndian) {
> >     >     +      Subtarget(TT, CPU, FS, *this, LittleEndian) {
> >     >     +  initReciprocals(*this, Subtarget);
> >     >        initAsmInfo();
> >     >      }
> >     >
> >     >     @@ -189,7 +214,7 @@ AArch64TargetMachine::getSubtargetImpl(c
> >     >          // function that reside in TargetOptions.
> >     >          resetTargetOptions(F);
> >     >          I = llvm::make_unique<AArch64Subtarget>(TargetTriple, CPU,
> >     >     FS, *this,
> >     >     - isLittle);
> >     >     + Subtarget.isLittleEndian());
> >     >      #ifndef LLVM_BUILD_GLOBAL_ISEL
> >     >         GISelAccessor *GISel = new GISelAccessor();
> >     >      #else
> >     >
> >     >     Modified: llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h
> >     >     URL:
> >     >
> >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h?rev=268539&r1=268538&r2=268539&view=diff
> >     >
> >
> ==============================================================================
> >     >     --- llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h
> >     (original)
> >     >     +++ llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h Wed
> May
> >     >     4 15:18:27 2016
> >     >     @@ -46,7 +46,7 @@ public:
> >     >        }
> >     >
> >     >      private:
> >     >     -  bool isLittle;
> >     >     +  AArch64Subtarget Subtarget;
> >     >      };
> >     >
> >     >
> >     > These changes aren't correct. There is no need and it's currently
> >     > invalid for the AArch64 port to depend on a single global subtarget
> >     > for all configuration. This is going to break anything that
> >     depends on
> >     > changing subtargets on a function by function basis (including
> >     > different -mcpu flags in different translation units or different
> >     > options).
> >     >
> >     > You're going to need to do all of these changes in a different
> >     way and
> >     > I haven't had time to look to see what's going on.
> >     >
> >     > Thanks!
> >     >
> >     > -eric
> >     >
> >     >
> >     >      // AArch64leTargetMachine - AArch64 little endian target
> >     machine.
> >     >
> >     >     Added: llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll
> >     >     URL:
> >     >
> >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll?rev=268539&view=auto
> >     >
> >
> ==============================================================================
> >     >     --- llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll (added)
> >     >     +++ llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll Wed May  4
> >     >     15:18:27 2016
> >     >     @@ -0,0 +1,79 @@
> >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> >     -recip=!div,!vec-div
> >     >     | FileCheck %s --check-prefix=FAULT
> >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> -recip=div,vec-div
> >     >      | FileCheck %s
> >     >     +
> >     >     +define float @frecp(float %x) #0 {
> >     >     +  %div = fdiv fast float 1.0, %x
> >     >     +  ret float %div
> >     >     +
> >     >     +; FAULT-LABEL: frecp:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fmov
> >     >     +; FAULT-NEXT: fdiv
> >     >     +
> >     >     +; CHECK-LABEL: frecp:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: frecpe
> >     >     +; CHECK-NEXT: fmov
> >     >     +}
> >     >     +
> >     >     +define <2 x float> @f2recp(<2 x float> %x) #0 {
> >     >     +  %div = fdiv fast <2 x float> <float 1.0, float 1.0>, %x
> >     >     +  ret <2 x float> %div
> >     >     +
> >     >     +; FAULT-LABEL: f2recp:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fmov
> >     >     +; FAULT-NEXT: fdiv
> >     >     +
> >     >     +; CHECK-LABEL: f2recp:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frecpe
> >     >     +}
> >     >     +
> >     >     +define <4 x float> @f4recp(<4 x float> %x) #0 {
> >     >     +  %div = fdiv fast <4 x float> <float 1.0, float 1.0, float
> >     1.0,
> >     >     float 1.0>, %x
> >     >     +  ret <4 x float> %div
> >     >     +
> >     >     +; FAULT-LABEL: f4recp:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fmov
> >     >     +; FAULT-NEXT: fdiv
> >     >     +
> >     >     +; CHECK-LABEL: f4recp:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frecpe
> >     >     +}
> >     >     +
> >     >     +define double @drecp(double %x) #0 {
> >     >     +  %div = fdiv fast double 1.0, %x
> >     >     +  ret double %div
> >     >     +
> >     >     +; FAULT-LABEL: drecp:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fmov
> >     >     +; FAULT-NEXT: fdiv
> >     >     +
> >     >     +; CHECK-LABEL: drecp:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: frecpe
> >     >     +; CHECK-NEXT: fmov
> >     >     +}
> >     >     +
> >     >     +define <2 x double> @d2recp(<2 x double> %x) #0 {
> >     >     +  %div = fdiv fast <2 x double> <double 1.0, double 1.0>, %x
> >     >     +  ret <2 x double> %div
> >     >     +
> >     >     +; FAULT-LABEL: d2recp:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fmov
> >     >     +; FAULT-NEXT: fdiv
> >     >     +
> >     >     +; CHECK-LABEL: d2recp:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frecpe
> >     >     +}
> >     >     +
> >     >     +attributes #0 = { nounwind "unsafe-fp-math"="true" }
> >     >
> >     >     Added: llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll
> >     >     URL:
> >     >
> >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll?rev=268539&view=auto
> >     >
> >
> ==============================================================================
> >     >     --- llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll (added)
> >     >     +++ llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll Wed May  4
> >     >     15:18:27 2016
> >     >     @@ -0,0 +1,158 @@
> >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> >     >     -recip=!sqrt,!vec-sqrt | FileCheck %s --check-prefix=FAULT
> >     >     +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> >     >     -recip=sqrt,vec-sqrt   | FileCheck %s
> >     >     +
> >     >     +declare float @llvm.sqrt.f32(float) #1
> >     >     +declare double @llvm.sqrt.f64(double) #1
> >     >     +declare <2 x float> @llvm.sqrt.v2f32(<2 x float>) #1
> >     >     +declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #1
> >     >     +declare <2 x double> @llvm.sqrt.v2f64(<2 x double>) #1
> >     >     +
> >     >     +define float @fsqrt(float %a) #0 {
> >     >     +  %1 = tail call fast float @llvm.sqrt.f32(float %a)
> >     >     +  ret float %1
> >     >     +
> >     >     +; FAULT-LABEL: fsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: fsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define <2 x float> @f2sqrt(<2 x float> %a) #0 {
> >     >     +  %1 = tail call fast <2 x float> @llvm.sqrt.v2f32(<2 x
> >     float> %a) #2
> >     >     +  ret <2 x float> %1
> >     >     +
> >     >     +; FAULT-LABEL: f2sqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: f2sqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: mov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define <4 x float> @f4sqrt(<4 x float> %a) #0 {
> >     >     +  %1 = tail call fast <4 x float> @llvm.sqrt.v4f32(<4 x
> >     float> %a) #2
> >     >     +  ret <4 x float> %1
> >     >     +
> >     >     +; FAULT-LABEL: f4sqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: f4sqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: mov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define double @dsqrt(double %a) #0 {
> >     >     +  %1 = tail call fast double @llvm.sqrt.f64(double %a)
> >     >     +  ret double %1
> >     >     +
> >     >     +; FAULT-LABEL: dsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: dsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define <2 x double> @d2sqrt(<2 x double> %a) #0 {
> >     >     +  %1 = tail call fast <2 x double> @llvm.sqrt.v2f64(<2 x
> >     double>
> >     >     %a) #2
> >     >     +  ret <2 x double> %1
> >     >     +
> >     >     +; FAULT-LABEL: d2sqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: d2sqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: mov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define float @frsqrt(float %a) #0 {
> >     >     +  %1 = tail call fast float @llvm.sqrt.f32(float %a)
> >     >     +  %2 = fdiv fast float 1.000000e+00, %1
> >     >     +  ret float %2
> >     >     +
> >     >     +; FAULT-LABEL: frsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: frsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define <2 x float> @f2rsqrt(<2 x float> %a) #0 {
> >     >     +  %1 = tail call fast <2 x float> @llvm.sqrt.v2f32(<2 x
> >     float> %a) #2
> >     >     +  %2 = fdiv fast <2 x float> <float 1.000000e+00, float
> >     >     1.000000e+00>, %1
> >     >     +  ret <2 x float> %2
> >     >     +
> >     >     +; FAULT-LABEL: f2rsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: f2rsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define <4 x float> @f4rsqrt(<4 x float> %a) #0 {
> >     >     +  %1 = tail call fast <4 x float> @llvm.sqrt.v4f32(<4 x
> >     float> %a) #2
> >     >     +  %2 = fdiv fast <4 x float> <float 1.000000e+00, float
> >     >     1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %1
> >     >     +  ret <4 x float> %2
> >     >     +
> >     >     +; FAULT-LABEL: f4rsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: f4rsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define double @drsqrt(double %a) #0 {
> >     >     +  %1 = tail call fast double @llvm.sqrt.f64(double %a)
> >     >     +  %2 = fdiv fast double 1.000000e+00, %1
> >     >     +  ret double %2
> >     >     +
> >     >     +; FAULT-LABEL: drsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: drsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +define <2 x double> @d2rsqrt(<2 x double> %a) #0 {
> >     >     +  %1 = tail call fast <2 x double> @llvm.sqrt.v2f64(<2 x
> >     double>
> >     >     %a) #2
> >     >     +  %2 = fdiv fast <2 x double> <double 1.000000e+00, double
> >     >     1.000000e+00>, %1
> >     >     +  ret <2 x double> %2
> >     >     +
> >     >     +; FAULT-LABEL: d2rsqrt:
> >     >     +; FAULT-NEXT: BB#0
> >     >     +; FAULT-NEXT: fsqrt
> >     >     +
> >     >     +; CHECK-LABEL: d2rsqrt:
> >     >     +; CHECK-NEXT: BB#0
> >     >     +; CHECK-NEXT: fmov
> >     >     +; CHECK-NEXT: frsqrte
> >     >     +}
> >     >     +
> >     >     +attributes #0 = { nounwind "unsafe-fp-math"="true" }
> >     >
> >     >
> >     >     _______________________________________________
> >     >     llvm-commits mailing list
> >     > llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> >     <mailto:llvm-commits at lists.llvm.org
> >     <mailto:llvm-commits at lists.llvm.org>>
> >     > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >     >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160920/6535c735/attachment-0001.html>