[llvm] r236031 - transform fadd chains to increase parallelism

Mehdi Amini mehdi.amini at apple.com
Thu Apr 30 12:21:10 PDT 2015


> On Apr 30, 2015, at 12:04 PM, Owen Anderson <resistor at mac.com> wrote:
> 
> 
>> On Apr 30, 2015, at 8:41 AM, Sanjay Patel <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote:
>> 
>> So to me, an in-order machine is still superscalar and pipelined. You have to expose ILP or you die a high-frequency death.
> 
> Many (most?) GPUs hide latencies via massive hyper threading rather than exploiting per-thread ILP.  The hardware presents a model where every instruction has unit latency, because the real latency is entirely hidden by hyper threading.  Using more registers eats up the finite pool of storage in the chip, limiting the number of threads that can run concurrently, and ultimately reducing the hardware’s ability to hyper thread, killing performance.
> 
> This isn’t just a concern for GPUs, though.  Even superscalar CPUs are not necessarily uniformly superscalar.  I’m aware of plenty of lower power designs that can multi-issue integer instructions but not floating point, for instance.

How would OOO change anything with respect to this transformation?

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150430/2637f7db/attachment.html>


More information about the llvm-commits mailing list