[llvm-commits] [RFC/PATCH] PPCDoubleDouble compile-time arithmetic

Mon Oct 29 10:04:33 PDT 2012

Hi Ulrich,

On 29/10/12 17:11, Ulrich Weigand wrote:
> Chris Lattner <clattner at apple.com> wrote on 28.10.2012 05:03:31:
>
>> Given that the PowerPC format expands out into operations on two
>> doubles, how reasonable would it be for clang to generate pre-
>> expanded IR that exposed this lowering to the optimizers?
>>
>> This wouldn't help you with constant parsing, but would simplify the
>> IR and optimizer and almost certainly give you better code quality
>> for this type.
>
> First of all, I'd tend to agree with Hal and Bill that expanding
> PowerPC double-double in the front end might be an interesting
> optimization for the longer term, but this is really an independent
> issue of what I'm addressing with the current patch set.  As you
> note as well, we'd still need APFloat support for things like
> constant parsing ...

it could be build on top of APFloat instead of being part of APFloat.

> So I'd certainly propose we should commit something along the lines
> of my patch soon, since without this long double is pretty much
> unusable.  Anything further can then still be done later on.  Do
> you agree, or would you object to the patch at this stage?
>
>
> Now, thinking further about what we could do in the future: it
> seems to me that to really "simplify the IR" would mean to
> completely remove "ppc_fp128" as a primitive type on the IR level.

I'm pretty sure this is what Chris has in mind (based on previous
discussions).

> As long as it is still there, we'd still have to deal with it.
> Is that what you had in mind?
>
> Now, in order to get rid of ppc_fp128 completely, I think there's
> a couple of issues that need to be considered.  I'm not sure I
> understand enough LLVM infrastructure at this point to come up
> with an exhausive list, but here's some points that come to mind
> immediately:
>
> - What about other front-ends than clang?  They'd all have to be
>    changed to likewise eliminate generation of ppc_fp128 ...

Correct.  However if LLVM gains some utility libraries for manipulating
"floating point number pairs" like PPC long double, this shouldn't be too
bad.  There are a bunch of classical algorithms for taking a pair of
floating point numbers (of arbitrary precision) and having the pair quack
like a floating point number of twice the precision.  It would be neat to
have a completely generic (generic in the size of the underlying floating
point type) implementation of this, and use it for PPC long double (as far
as I know PPC long doubles are an instance of this technique).

> - How to represent ppc_fp128 values used as function arguments
>    or return values?  It seems the back-end still needs to handle
>    them differently; for example, passing a long double is *nearly*
>    the same as passing two doubles, except that a long double may
>    never be split such that one half is passed in register and
>    the other in memory.  *Returning* a long double is even more
>    special, since it is returned in a float register pair, unlike
>    any other type ...

I think this is an issue for the front-end.  If both doubles should
go on the stack, then both should get the onstack attribute (not yet
implemented), if both should go in registers they both get inreg.  As
for returning them, it sounds analogous to returning { double, double }
which is what x86-64 does to return a complex number IIRC (i.e. in a
pair of floating point registers).

>
> - How ought expansion of arithmetic operations look like?
>    Currently, these are done by calling library routines like
>    __gcc_qadd.  We could expand those calls (as calls) in the
>    front end.  But that would actually *reduce* the opportunities
>    for the optimizers to work on long double: currently, they
>    see "add" nodes throughout optimization (and thus can act on
>    things like operands becoming known constant).  If they saw
>    only function calls, this might be more difficult.

We could teach the optimizers the semantics of these library calls.

>    On the other hand, we could expand the whole algorithm used
>    by those helper routines inline.  This would expose the internals
>    to the optimizers.  But those algorithms are somewhat large
>    (and carefully tuned the way they are to attempt to contain
>    build-up of inaccuracies ...), and it's unclear that
>    unconditional inline expansion really lead to better
>    performance overall, taking code growth into account.

They are currently expanded inline by the code generators, so you
already have the code growth problem.

Ciao, Duncan.

>
> I guess this would need some experimentation to find out what the
> best way is, and what performance improvements (if any) we can
> find.   Overall, I expect that at this point other improvements
> to PowerPC code generation have bigger opportunities to visibly
> help overall performance (VSX support?  full support for new-ish
> instruction sets in general?), so I'd probably put long double
> improvements lower on the priority list  (once we actually get
> it working at all, of course).
>
> Bye,
> Ulrich
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>