[llvm] r268539 - [AArch64] Use the reciprocal estimation machinery
Eric Christopher via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 20 12:43:11 PDT 2016
On Tue, Sep 20, 2016 at 11:50 AM Sanjay Patel <spatel at rotateright.com>
wrote:
> Hi Evandro -
>
> Yes, you're absolutely correct that doing this at the DAG level allows
> more folding. In fact, the codegen for x86 was recently refined some more
> with:
> https://reviews.llvm.org/D21127
>
> And there was another heuristic-driven DAG sqrt patch for x86 after that.
>
> I think everyone agrees in principle that we need more machine-level
> folding, but it seems to take more effort to do the work down there, so
> it's not happening very much?
>
> I may not be understanding what's going on with this patch or this thread.
> Is the revert request because we want to fix/nuke TargetOptions or
> something else?
>
>
The revert request is for a couple of reasons:
a) Yes, TargetOptions should go away as much as possible,
b) The non-x86 implementations of this break LTO.
> Ie, it's going to take multiple steps to reach the ideal solution. I'm
> assuming we need to get the recip options attached to IR somehow (FMF,
> metadata) which then needs to get passed to the DAG so those folds can
> continue working like they do today. Redoing/enhancing the DAG folding in
> MI would be the final step, and that requires running FMF plumbing to MI
> because that doesn't exist yet AFAIK.
>
>
Can you at least just start attaching it to the function as an option for
now?
-eric
>
>
> On Tue, Sep 20, 2016 at 12:11 PM, Evandro Menezes <e.menezes at samsung.com>
> wrote:
>
> Hi, Sanjay.
>
> My concern is that, at the moment, when the series is expanded by the DAG
> combiner, it benefits from other optimizations through the rest of its
> lifetime down to MIs. I'm afraid that otherwise some of these
> optimizations would have to be duplicated at that level.
>
> More specifically, the series expands in a polynomial whose sums and
> multiplications are typically fused, etc along with other surrounding
> operations.
>
> Sigh...
>
> --
> Evandro Menezes
>
> On 09/20/16 12:24, Sanjay Patel wrote:
>
> I knew reciprocal estimates were going to come back to haunt me. :)
>
> Trying to page some of these patches back into my brain...at some point
> last year, it became clear that sticking a Recip struct into TargetOptions
> (like the FPOpFusion enum that preceded it) was a horrible solution because
> it completely failed with LTO.
>
> This led to:
> https://reviews.llvm.org/D14707 <https://reviews.llvm.org/D14707>
>
> ...which may be controversial itself:
>
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160104/323154.html
>
> I don't have a good understanding of the design trade-offs, but here's my
> ideal world recip functionality: allow complete specification of any recip
> op at the instruction level (because some customers want to control this
> via source code pragmas).
>
> So that's IR-level FastMathFlags, but it needs to also include a field for
> the refinement steps for a rsqrt or recip. And then that info needs to
> propagate from IR to the DAG and eventually MachineInst...because that's
> where we will actually use it.
>
> If we also add an AllowFMA field to FastMathFlags, then we can get rid of
> both of those warts (AllowFPOpFusion, Reciprocals) in TargetOptions?
>
>
>
> On Tue, Sep 20, 2016 at 10:04 AM, Eric Christopher <echristo at gmail.com
> <mailto:echristo at gmail.com>> wrote:
>
>
>
> On Tue, Sep 20, 2016 at 8:48 AM Evandro Menezes
> <e.menezes at samsung.com <mailto:e.menezes at samsung.com>> wrote:
>
> Eric,
>
> Will do. Since it's a few months old, it'll take me a couple
> of hours
> or so to revert it.
>
>
> No worries at all. I hear rumor that Hal is on vacation so maybe
> we can come up with a solution by the time he can come back and
> deal with the power version :)
>
> I'd appreciate any further information to point me in the
> right direction.
>
>
> Of course. The rough idea is that the TargetMachine holds things
> that only cover tuning/code generation for an entire module -
> since in a merged module you've got functions coming in with
> possibly different options then you need to come up with a
> function based way to do the tuning.
>
> One thought is using TT, we'll probably want to pull the
> reciprocal implementation off of TargetOptions since it can vary
> per subtarget and instead call through TTI during optimization to
> figure out what we can do and TTI is subtarget dependent.
>
> Thoughts? Sanjay?
>
> -eric
>
> Thank you,
>
> --
> Evandro Menezes
>
> On 09/20/16 10:32, Eric Christopher wrote:
> >
> >
> > On Tue, Sep 20, 2016 at 5:15 AM Evandro Menezes
> <e.menezes at samsung.com <mailto:e.menezes at samsung.com>
> > <mailto:e.menezes at samsung.com
>
> <mailto:e.menezes at samsung.com>>> wrote:
> >
> > Hi, Eric.
> >
> > All the other targets which use the Newton series
> approximations
> > include
> > a Subtarget in TargetMatchine. This change merely
> follows the pattern
> > already found in the PowerPC and X86 ports.
> >
> >
> > This patch is definitely beyond the level of post-commit
> review for
> > areas you've typically touched and I'm asking you to revert
> it. In
> > addition, I'll ask Hal to revert the patch in r241985 while
> we come up
> > with a better implementation.
> >
> > Thus, if anything, the discussion should be broader than
> AArch64.
> > Until
> > that's settled, it's preferable to keep this patch until
> an universal
> > solution is found, IMO.
> >
> >
> > This discussion has already happened when I did the work in
> the first
> > place to deal with function based subtargets and LTO. In
> addition, the
> > work that you've done has had a flaw explicitly pointed out
> - you'll
> > have different values that won't get reset between functions.
> >
> > All subtarget dependent machinery is (unless the target
> really doesn't
> > care like most of the gpus) based on the function and you
> can't depend
> > upon a single subtarget for any TargetMachine. This means
> that, in
> > general, you also can't depend upon it for initializing the pass
> > structure or overall backend machinery.
> >
> > As I mentioned before, during LTO if you compile two
> functions with
> > different cpu options and link them together you should get
> different
> > code and tuning, in this case you've now broken it for the
> reciprocal
> > code.
> >
> > -eric
> >
> > Thank you,
> >
> > --
> > Evandro Menezes
> >
> > On 09/19/16 19:49, Eric Christopher wrote:
> > > Hi Evandro,
> > >
> > > I'm going to ask you to revert this patch please.
> There are a
> > few reasons:
> > >
> > > a) I don't see a review thread here? This changes the
> port quite
> > a bit.
> > >
> > >
> > > ///
> > > AArch64TargetMachine::AArch64TargetMachine(const
> Target &T,
> > const
> > > Triple &TT,
> > > @@ -149,7 +173,8 @@
> AArch64TargetMachine::AArch64TargetMachi
> > > : LLVMTargetMachine(T, computeDataLayout(TT,
> LittleEndian),
> > > TT, CPU, FS,
> > > Options, RM, CM, OL),
> > > TLOF(createTLOF(getTargetTriple())),
> > > - isLittle(LittleEndian) {
> > > + Subtarget(TT, CPU, FS, *this, LittleEndian) {
> > > + initReciprocals(*this, Subtarget);
> > > initAsmInfo();
> > > }
> > >
> > > @@ -189,7 +214,7 @@
> AArch64TargetMachine::getSubtargetImpl(c
> > > // function that reside in TargetOptions.
> > > resetTargetOptions(F);
> > > I =
> llvm::make_unique<AArch64Subtarget>(TargetTriple, CPU,
> > > FS, *this,
> > > - isLittle);
> > > + Subtarget.isLittleEndian());
> > > #ifndef LLVM_BUILD_GLOBAL_ISEL
> > > GISelAccessor *GISel = new GISelAccessor();
> > > #else
> > >
> > > Modified:
> llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h
> > > URL:
> > >
> >
>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h?rev=268539&r1=268538&r2=268539&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h?rev=268539&r1=268538&r2=268539&view=diff
> >
> > >
> >
> ==============================================================================
> > > ---
> llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h
> > (original)
> > > +++
> llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.h Wed May
> > > 4 15:18:27 2016
> > > @@ -46,7 +46,7 @@ public:
> > > }
> > >
> > > private:
> > > - bool isLittle;
> > > + AArch64Subtarget Subtarget;
> > > };
> > >
> > >
> > > These changes aren't correct. There is no need and
> it's currently
> > > invalid for the AArch64 port to depend on a single
> global subtarget
> > > for all configuration. This is going to break anything
> that
> > depends on
> > > changing subtargets on a function by function basis
> (including
> > > different -mcpu flags in different translation units
> or different
> > > options).
> > >
> > > You're going to need to do all of these changes in a
> different
> > way and
> > > I haven't had time to look to see what's going on.
> > >
> > > Thanks!
> > >
> > > -eric
> > >
> > >
> > > // AArch64leTargetMachine - AArch64 little endian
> target
> > machine.
> > >
> > > Added:
> llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll
> > > URL:
> > >
> >
>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll?rev=268539&view=auto
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll?rev=268539&view=auto
> >
> > >
> >
> ==============================================================================
> > > ---
> llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll (added)
> > > +++
> llvm/trunk/test/CodeGen/AArch64/recp-fastmath.ll Wed May 4
> > > 15:18:27 2016
> > > @@ -0,0 +1,79 @@
> > > +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> > -recip=!div,!vec-div
> > > | FileCheck %s --check-prefix=FAULT
> > > +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> -recip=div,vec-div
> > > | FileCheck %s
> > > +
> > > +define float @frecp(float %x) #0 {
> > > + %div = fdiv fast float 1.0, %x
> > > + ret float %div
> > > +
> > > +; FAULT-LABEL: frecp:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fmov
> > > +; FAULT-NEXT: fdiv
> > > +
> > > +; CHECK-LABEL: frecp:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: frecpe
> > > +; CHECK-NEXT: fmov
> > > +}
> > > +
> > > +define <2 x float> @f2recp(<2 x float> %x) #0 {
> > > + %div = fdiv fast <2 x float> <float 1.0, float
> 1.0>, %x
> > > + ret <2 x float> %div
> > > +
> > > +; FAULT-LABEL: f2recp:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fmov
> > > +; FAULT-NEXT: fdiv
> > > +
> > > +; CHECK-LABEL: f2recp:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frecpe
> > > +}
> > > +
> > > +define <4 x float> @f4recp(<4 x float> %x) #0 {
> > > + %div = fdiv fast <4 x float> <float 1.0, float
> 1.0, float
> > 1.0,
> > > float 1.0>, %x
> > > + ret <4 x float> %div
> > > +
> > > +; FAULT-LABEL: f4recp:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fmov
> > > +; FAULT-NEXT: fdiv
> > > +
> > > +; CHECK-LABEL: f4recp:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frecpe
> > > +}
> > > +
> > > +define double @drecp(double %x) #0 {
> > > + %div = fdiv fast double 1.0, %x
> > > + ret double %div
> > > +
> > > +; FAULT-LABEL: drecp:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fmov
> > > +; FAULT-NEXT: fdiv
> > > +
> > > +; CHECK-LABEL: drecp:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: frecpe
> > > +; CHECK-NEXT: fmov
> > > +}
> > > +
> > > +define <2 x double> @d2recp(<2 x double> %x) #0 {
> > > + %div = fdiv fast <2 x double> <double 1.0,
> double 1.0>, %x
> > > + ret <2 x double> %div
> > > +
> > > +; FAULT-LABEL: d2recp:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fmov
> > > +; FAULT-NEXT: fdiv
> > > +
> > > +; CHECK-LABEL: d2recp:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frecpe
> > > +}
> > > +
> > > +attributes #0 = { nounwind "unsafe-fp-math"="true" }
> > >
> > > Added:
> llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll
> > > URL:
> > >
> >
>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll?rev=268539&view=auto
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll?rev=268539&view=auto
> >
> > >
> >
> ==============================================================================
> > > ---
> llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll (added)
> > > +++
> llvm/trunk/test/CodeGen/AArch64/sqrt-fastmath.ll Wed May 4
> > > 15:18:27 2016
> > > @@ -0,0 +1,158 @@
> > > +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> > > -recip=!sqrt,!vec-sqrt | FileCheck %s
> --check-prefix=FAULT
> > > +; RUN: llc < %s -mtriple=aarch64 -mattr=neon
> > > -recip=sqrt,vec-sqrt | FileCheck %s
> > > +
> > > +declare float @llvm.sqrt.f32(float) #1
> > > +declare double @llvm.sqrt.f64(double) #1
> > > +declare <2 x float> @llvm.sqrt.v2f32(<2 x float>) #1
> > > +declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #1
> > > +declare <2 x double> @llvm.sqrt.v2f64(<2 x
> double>) #1
> > > +
> > > +define float @fsqrt(float %a) #0 {
> > > + %1 = tail call fast float @llvm.sqrt.f32(float %a)
> > > + ret float %1
> > > +
> > > +; FAULT-LABEL: fsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: fsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define <2 x float> @f2sqrt(<2 x float> %a) #0 {
> > > + %1 = tail call fast <2 x float>
> @llvm.sqrt.v2f32(<2 x
> > float> %a) #2
> > > + ret <2 x float> %1
> > > +
> > > +; FAULT-LABEL: f2sqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: f2sqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: mov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define <4 x float> @f4sqrt(<4 x float> %a) #0 {
> > > + %1 = tail call fast <4 x float>
> @llvm.sqrt.v4f32(<4 x
> > float> %a) #2
> > > + ret <4 x float> %1
> > > +
> > > +; FAULT-LABEL: f4sqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: f4sqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: mov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define double @dsqrt(double %a) #0 {
> > > + %1 = tail call fast double
> @llvm.sqrt.f64(double %a)
> > > + ret double %1
> > > +
> > > +; FAULT-LABEL: dsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: dsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define <2 x double> @d2sqrt(<2 x double> %a) #0 {
> > > + %1 = tail call fast <2 x double>
> @llvm.sqrt.v2f64(<2 x
> > double>
> > > %a) #2
> > > + ret <2 x double> %1
> > > +
> > > +; FAULT-LABEL: d2sqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: d2sqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: mov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define float @frsqrt(float %a) #0 {
> > > + %1 = tail call fast float @llvm.sqrt.f32(float %a)
> > > + %2 = fdiv fast float 1.000000e+00, %1
> > > + ret float %2
> > > +
> > > +; FAULT-LABEL: frsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: frsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define <2 x float> @f2rsqrt(<2 x float> %a) #0 {
> > > + %1 = tail call fast <2 x float>
> @llvm.sqrt.v2f32(<2 x
> > float> %a) #2
> > > + %2 = fdiv fast <2 x float> <float 1.000000e+00,
> float
> > > 1.000000e+00>, %1
> > > + ret <2 x float> %2
> > > +
> > > +; FAULT-LABEL: f2rsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: f2rsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define <4 x float> @f4rsqrt(<4 x float> %a) #0 {
> > > + %1 = tail call fast <4 x float>
> @llvm.sqrt.v4f32(<4 x
> > float> %a) #2
> > > + %2 = fdiv fast <4 x float> <float 1.000000e+00,
> float
> > > 1.000000e+00, float 1.000000e+00, float
> 1.000000e+00>, %1
> > > + ret <4 x float> %2
> > > +
> > > +; FAULT-LABEL: f4rsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: f4rsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define double @drsqrt(double %a) #0 {
> > > + %1 = tail call fast double
> @llvm.sqrt.f64(double %a)
> > > + %2 = fdiv fast double 1.000000e+00, %1
> > > + ret double %2
> > > +
> > > +; FAULT-LABEL: drsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: drsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +define <2 x double> @d2rsqrt(<2 x double> %a) #0 {
> > > + %1 = tail call fast <2 x double>
> @llvm.sqrt.v2f64(<2 x
> > double>
> > > %a) #2
> > > + %2 = fdiv fast <2 x double> <double
> 1.000000e+00, double
> > > 1.000000e+00>, %1
> > > + ret <2 x double> %2
> > > +
> > > +; FAULT-LABEL: d2rsqrt:
> > > +; FAULT-NEXT: BB#0
> > > +; FAULT-NEXT: fsqrt
> > > +
> > > +; CHECK-LABEL: d2rsqrt:
> > > +; CHECK-NEXT: BB#0
> > > +; CHECK-NEXT: fmov
> > > +; CHECK-NEXT: frsqrte
> > > +}
> > > +
> > > +attributes #0 = { nounwind "unsafe-fp-math"="true" }
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at lists.llvm.org
> <mailto:llvm-commits at lists.llvm.org>
> <mailto:llvm-commits at lists.llvm.org
> <mailto:llvm-commits at lists.llvm.org>>
> > <mailto:llvm-commits at lists.llvm.org
> <mailto:llvm-commits at lists.llvm.org>
> > <mailto:llvm-commits at lists.llvm.org
> <mailto:llvm-commits at lists.llvm.org>>>
> > >
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
> > >
> >
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160920/aace01d9/attachment.html>
More information about the llvm-commits
mailing list