[PATCH] Refactor and enhance FMA combine

Owen Anderson resistor at mac.com
Mon Mar 23 21:31:36 PDT 2015


> On Mar 23, 2015, at 1:48 PM, Mehdi AMINI <mehdi.amini at apple.com> wrote:
> 
>> In principle you're right, that might not be *always* beneficial. But in general, it should be, because even when "high precision" operations are twice more expensive than "low precision" one, the transformation does not worsen things. Right now this is only enabled for PPC, for which low and high precision operations have the same cost. Tell me if this is not acceptable.
> Well you can imagine having more than twice the throughput in f16 than f32 on some targets, and you can also imagine that 2 x f16 operations consume less power than one f32.
> I'd rather have Owen's opinion on this.

It’s pretty standard for GPUs to have higher throughput on narrower datatypes.  For instance, if double precision is half the throughput of single precision, then the proposed optimization turns a three cycle sequence into a four cycle sequence.

—Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150323/b6e91a47/attachment.html>


More information about the llvm-commits mailing list