[PATCH] Refactor and enhance FMA combine

Wed Apr 1 11:23:30 PDT 2015

If the canonicalization should be fixed elsewhere, I guess we agree that
this patch could be applied?

Thanks for your help,

Olivier

2015-03-26 12:36 GMT-04:00 Mehdi Amini <mehdi.amini at apple.com>:

>
> On Mar 25, 2015, at 9:41 PM, Owen Anderson <resistor at mac.com> wrote:
>
>
> On Mar 24, 2015, at 1:39 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>
> On Mar 23, 2015, at 9:31 PM, Owen Anderson <resistor at mac.com> wrote:
>
>
> On Mar 23, 2015, at 1:48 PM, Mehdi AMINI <mehdi.amini at apple.com> wrote:
>
> In principle you're right, that might not be *always* beneficial. But in
> general, it should be, because even when "high precision" operations are
> twice more expensive than "low precision" one, the transformation does not
> worsen things. Right now this is only enabled for PPC, for which low and
> high precision operations have the same cost. Tell me if this is not
> acceptable.
>
> Well you can imagine having more than twice the throughput in f16 than f32
> on some targets, and you can also imagine that 2 x f16 operations consume
> less power than one f32.
> I'd rather have Owen's opinion on this.
>
>
> It’s pretty standard for GPUs to have higher throughput on narrower
> datatypes.  For instance, if double precision is half the throughput of
> single precision, then the proposed optimization turns a three cycle
> sequence into a four cycle sequence.
>
>
> Not exactly, I believe the proposed optimization turns two “low" and a
> “high” into two “high”.
> Note that it seems to me that this optimization can apply if the two low
> are f16 and the high is a double precision. In pseudo IR code:
>
> %mul = fmul half %u, %v
> %fma = fma half %x, %y, %mul
> %fmaext = fpextend half %fma to double
> %fadd = fadd double %fmaext, %z
>
> becomes:
>
> %xext = fpextend half %x to double
> %yext = fpextend half %y to double
> %uext = fpextend half %u to double
> %vext = fpextend half %v to double
>
> %fma1 = fma double %uext, %vext, %zext
> %fma = fma double %xext, %yext, %fma1
>
> (assuming that both half and double are legal on the target)
>
>
> In that case, this looks more reasonable.  The profitabilty would depend
> on the ratio of the processor in question, but 2x seems like a pretty
> common design point.
>
>
> NVidia GT200 has a 1:8 fp64:fp32 ratio, and the brand new GM200 has a 1:32
> ratio :)
> But I agree that it is probably not the common case and I’m OK with the
> added comment so that if anyone has a need to fix it, it should be
> spottable.
>
> What remains in this revision is the canonicalization that should be done
> in a specific combine and not here I think.
>
> —
> Mehdi
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150401/2eb96448/attachment.html>