[llvm] r236031 - transform fadd chains to increase parallelism

Thu Apr 30 08:41:25 PDT 2015

I still don't get it. Let me state my ignorance and bias: I know nothing
about GPUs, and when someone says "in-order", I have a ~10 year flashback
to Power6 and Cell development.

So to me, an in-order machine is still superscalar and pipelined. You have
to expose ILP or you die a high-frequency death. And so when you have a 10
cycle latency fadd (!), we still want to do this:

fadd f4, f1, f0
fadd f4, f4, f2
fadd f4, f4, f3

->

fadd f4, f1, f0
fadd f5, f2, f3
fadd f4, f4, f5

because the first sequence is 30 cycles and the second is 20-21 cycles.

It makes sense to me that a non-pipelined, non-superscalar machine would
see no benefit from the transform, but in-order alone is not the
differentiator I'm thinking of if we're going to limit this transform by
target in the DAG.

On Wed, Apr 29, 2015 at 9:42 PM, Owen Anderson <resistor at mac.com> wrote:

>
> On Apr 29, 2015, at 8:32 PM, Sanjay Patel <spatel at rotateright.com> wrote:
>
> I'm not seeing how in-order vs. OOO is a factor?
>
>
> The transformation increases register pressure in order to expose ILP.  On
> an in-order machine, that is a purely negative tradeoff, since there’s no
> advantage to exposing ILP.
>
> —Owen
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150430/cac85057/attachment.html>