[llvm] r236031 - transform fadd chains to increase parallelism

Sanjay Patel spatel at rotateright.com
Thu Apr 30 14:34:43 PDT 2015


Thanks all. Between Owen's GPU description and Mehdi's test cases, I can
see how this patch went off the rails.

I'm back to wondering if we can still do this as a DAG combine with the
help of a target hook:

TLI.getReassociationLimit(Opcode, EVT)

For some operation on some data type, does it make sense to attempt to
extract some ILP? By default, we'd make this 0. For a machine that has no
exposed superscalar / pipelining ILP opportunities, it would always return
0. If non-zero, the number would be a value that's based on the number of
registers and/or issue width and/or pipe stages for the given operation.
Something like the 'vectorization factor' or 'interleave factor' used by
the vectorizers?

unsigned CombineCount = 0;
while (CombineCount < TLI.getReassociationLimit(Opcode, EVT))
  if (tryTheCombine(Opcode, EVT)
    CombineCount++;


On Thu, Apr 30, 2015 at 1:25 PM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
> On Thu, Apr 30, 2015 at 12:24 PM Mehdi Amini <mehdi.amini at apple.com>
> wrote:
>
>> On Apr 30, 2015, at 12:04 PM, Owen Anderson <resistor at mac.com> wrote:
>>
>>
>> On Apr 30, 2015, at 8:41 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>>
>> So to me, an in-order machine is still superscalar and pipelined. You
>> have to expose ILP or you die a high-frequency death.
>>
>>
>> Many (most?) GPUs hide latencies via massive hyper threading rather than
>> exploiting per-thread ILP.  The hardware presents a model where every
>> instruction has unit latency, because the real latency is entirely hidden
>> by hyper threading.  Using more registers eats up the finite pool of
>> storage in the chip, limiting the number of threads that can run
>> concurrently, and ultimately reducing the hardware’s ability to hyper
>> thread, killing performance.
>>
>> This isn’t just a concern for GPUs, though.  Even superscalar CPUs are
>> not necessarily uniformly superscalar.  I’m aware of plenty of lower power
>> designs that can multi-issue integer instructions but not floating point,
>> for instance.
>>
>>
>> How would OOO change anything with respect to this transformation?
>>
>>
> Basically using a simplifying assumption of OoO is "really large multiple
> issue".
>
> -eric
>
>
>>>> Mehdi
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150430/35768104/attachment.html>


More information about the llvm-commits mailing list