[llvm] r236031 - transform fadd chains to increase parallelism

Thu Apr 30 12:04:57 PDT 2015

> On Apr 30, 2015, at 8:41 AM, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> So to me, an in-order machine is still superscalar and pipelined. You have to expose ILP or you die a high-frequency death.

Many (most?) GPUs hide latencies via massive hyper threading rather than exploiting per-thread ILP.  The hardware presents a model where every instruction has unit latency, because the real latency is entirely hidden by hyper threading.  Using more registers eats up the finite pool of storage in the chip, limiting the number of threads that can run concurrently, and ultimately reducing the hardware’s ability to hyper thread, killing performance.

This isn’t just a concern for GPUs, though.  Even superscalar CPUs are not necessarily uniformly superscalar.  I’m aware of plenty of lower power designs that can multi-issue integer instructions but not floating point, for instance.

—Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150430/27767993/attachment.html>