[llvm-commits] [RFC/PATCH] PPCDoubleDouble compile-time arithmetic

Mon Oct 29 09:11:56 PDT 2012

Chris Lattner <clattner at apple.com> wrote on 28.10.2012 05:03:31:

> Given that the PowerPC format expands out into operations on two
> doubles, how reasonable would it be for clang to generate pre-
> expanded IR that exposed this lowering to the optimizers?
>
> This wouldn't help you with constant parsing, but would simplify the
> IR and optimizer and almost certainly give you better code quality
> for this type.

First of all, I'd tend to agree with Hal and Bill that expanding
PowerPC double-double in the front end might be an interesting
optimization for the longer term, but this is really an independent
issue of what I'm addressing with the current patch set.  As you
note as well, we'd still need APFloat support for things like
constant parsing ...

So I'd certainly propose we should commit something along the lines
of my patch soon, since without this long double is pretty much
unusable.  Anything further can then still be done later on.  Do
you agree, or would you object to the patch at this stage?

Now, thinking further about what we could do in the future: it
seems to me that to really "simplify the IR" would mean to
completely remove "ppc_fp128" as a primitive type on the IR level.
As long as it is still there, we'd still have to deal with it.
Is that what you had in mind?

Now, in order to get rid of ppc_fp128 completely, I think there's
a couple of issues that need to be considered.  I'm not sure I
understand enough LLVM infrastructure at this point to come up
with an exhausive list, but here's some points that come to mind
immediately:

- What about other front-ends than clang?  They'd all have to be
  changed to likewise eliminate generation of ppc_fp128 ...

- How to represent ppc_fp128 values used as function arguments
  or return values?  It seems the back-end still needs to handle
  them differently; for example, passing a long double is *nearly*
  the same as passing two doubles, except that a long double may
  never be split such that one half is passed in register and
  the other in memory.  *Returning* a long double is even more
  special, since it is returned in a float register pair, unlike
  any other type ...

- How ought expansion of arithmetic operations look like?
  Currently, these are done by calling library routines like
  __gcc_qadd.  We could expand those calls (as calls) in the
  front end.  But that would actually *reduce* the opportunities
  for the optimizers to work on long double: currently, they
  see "add" nodes throughout optimization (and thus can act on
  things like operands becoming known constant).  If they saw
  only function calls, this might be more difficult.

  On the other hand, we could expand the whole algorithm used
  by those helper routines inline.  This would expose the internals
  to the optimizers.  But those algorithms are somewhat large
  (and carefully tuned the way they are to attempt to contain
  build-up of inaccuracies ...), and it's unclear that
  unconditional inline expansion really lead to better
  performance overall, taking code growth into account.

I guess this would need some experimentation to find out what the
best way is, and what performance improvements (if any) we can
find.   Overall, I expect that at this point other improvements
to PowerPC code generation have bigger opportunities to visibly
help overall performance (VSX support?  full support for new-ish
instruction sets in general?), so I'd probably put long double
improvements lower on the priority list  (once we actually get
it working at all, of course).

Bye,
Ulrich