[llvm-commits] [RFC/PATCH] PPCDoubleDouble compile-time arithmetic

Mon Oct 29 10:26:38 PDT 2012

----- Original Message -----
> From: "Duncan Sands" <baldrick at free.fr>
> To: llvm-commits at cs.uiuc.edu
> Sent: Monday, October 29, 2012 12:04:33 PM
> Subject: Re: [llvm-commits] [RFC/PATCH] PPCDoubleDouble compile-time	arithmetic
> 
> Hi Ulrich,
> 
> On 29/10/12 17:11, Ulrich Weigand wrote:
> > Chris Lattner <clattner at apple.com> wrote on 28.10.2012 05:03:31:
> >
> >> Given that the PowerPC format expands out into operations on two
> >> doubles, how reasonable would it be for clang to generate pre-
> >> expanded IR that exposed this lowering to the optimizers?
> >>
> >> This wouldn't help you with constant parsing, but would simplify
> >> the
> >> IR and optimizer and almost certainly give you better code quality
> >> for this type.
> >
> > First of all, I'd tend to agree with Hal and Bill that expanding
> > PowerPC double-double in the front end might be an interesting
> > optimization for the longer term, but this is really an independent
> > issue of what I'm addressing with the current patch set.  As you
> > note as well, we'd still need APFloat support for things like
> > constant parsing ...
> 
> it could be build on top of APFloat instead of being part of APFloat.
> 
> > So I'd certainly propose we should commit something along the lines
> > of my patch soon, since without this long double is pretty much
> > unusable.  Anything further can then still be done later on.  Do
> > you agree, or would you object to the patch at this stage?
> >
> >
> > Now, thinking further about what we could do in the future: it
> > seems to me that to really "simplify the IR" would mean to
> > completely remove "ppc_fp128" as a primitive type on the IR level.
> 
> I'm pretty sure this is what Chris has in mind (based on previous
> discussions).

Yes, this is also my understanding.

Nevertheless, I think that, regardless of the later direction, this is the correct incremental step. It does, after all, make APFloat cleaner, removing a bunch of non-working code and replacing it with (simpler) working code. Unless someone has a specific objection, let's commit this.

> 
> > As long as it is still there, we'd still have to deal with it.
> > Is that what you had in mind?
> >
> > Now, in order to get rid of ppc_fp128 completely, I think there's
> > a couple of issues that need to be considered.  I'm not sure I
> > understand enough LLVM infrastructure at this point to come up
> > with an exhausive list, but here's some points that come to mind
> > immediately:
> >
> > - What about other front-ends than clang?  They'd all have to be
> >    changed to likewise eliminate generation of ppc_fp128 ...
> 
> Correct.  However if LLVM gains some utility libraries for
> manipulating
> "floating point number pairs" like PPC long double, this shouldn't be
> too
> bad.  There are a bunch of classical algorithms for taking a pair of
> floating point numbers (of arbitrary precision) and having the pair
> quack
> like a floating point number of twice the precision.  It would be
> neat to
> have a completely generic (generic in the size of the underlying
> floating
> point type) implementation of this, and use it for PPC long double
> (as far
> as I know PPC long doubles are an instance of this technique).

I agree this would be nice. We could then add it as a clang extension as well, and I think a lot of people would really like that. Nevertheless, there are a number of special cases in the algorithms, and the code will take time to develop.

> 
> > - How to represent ppc_fp128 values used as function arguments
> >    or return values?  It seems the back-end still needs to handle
> >    them differently; for example, passing a long double is *nearly*
> >    the same as passing two doubles, except that a long double may
> >    never be split such that one half is passed in register and
> >    the other in memory.  *Returning* a long double is even more
> >    special, since it is returned in a float register pair, unlike
> >    any other type ...
> 
> I think this is an issue for the front-end.  If both doubles should
> go on the stack, then both should get the onstack attribute (not yet
> implemented), if both should go in registers they both get inreg.  As
> for returning them, it sounds analogous to returning { double, double
> }
> which is what x86-64 does to return a complex number IIRC (i.e. in a
> pair of floating point registers).
> 
> >
> > - How ought expansion of arithmetic operations look like?
> >    Currently, these are done by calling library routines like
> >    __gcc_qadd.  We could expand those calls (as calls) in the
> >    front end.  But that would actually *reduce* the opportunities
> >    for the optimizers to work on long double: currently, they
> >    see "add" nodes throughout optimization (and thus can act on
> >    things like operands becoming known constant).  If they saw
> >    only function calls, this might be more difficult.
> 
> We could teach the optimizers the semantics of these library calls.

And this is where things start to get messy ;)

> 
> >    On the other hand, we could expand the whole algorithm used
> >    by those helper routines inline.  This would expose the
> >    internals
> >    to the optimizers.  But those algorithms are somewhat large
> >    (and carefully tuned the way they are to attempt to contain
> >    build-up of inaccuracies ...), and it's unclear that
> >    unconditional inline expansion really lead to better
> >    performance overall, taking code growth into account.
> 
> They are currently expanded inline by the code generators, so you
> already have the code growth problem.

Unless the runtime libraries are compiled with LLVM, and we're using LTO, then they're not expanded.

Thanks again,
Hal

> 
> Ciao, Duncan.
> 
> >
> > I guess this would need some experimentation to find out what the
> > best way is, and what performance improvements (if any) we can
> > find.   Overall, I expect that at this point other improvements
> > to PowerPC code generation have bigger opportunities to visibly
> > help overall performance (VSX support?  full support for new-ish
> > instruction sets in general?), so I'd probably put long double
> > improvements lower on the priority list  (once we actually get
> > it working at all, of course).
> >
> > Bye,
> > Ulrich
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory