[PATCH] Refactor and enhance FMA combine

Mon Mar 23 13:56:50 PDT 2015

In http://reviews.llvm.org/D8050#145554, @ohsallen wrote:

> I'll commit this patch this week if no one is particularly against it ;-)

No, that's not how it works. You need to wait for an explicit okay. (but please do ping once per week if you're missing reviewer feedback).

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:7060
@@ +7059,3 @@
+      // fold (fadd (fpext (fma x, y, (fmul u, v))), z)
+      //   -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))
+      if (N0.getOpcode() == ISD::FP_EXTEND) {
----------------
ohsallen wrote:
> joker.eph wrote:
> > This turns two "low precision" and one "high precision" operations into two "high precision" operation. 
> > I'm not saying it is necessarily bad, but I'm not convinced it is always beneficial.
> > The other FMA combines don't have the same behavior.
> In principle you're right, that might not be *always* beneficial. But in general, it should be, because even when "high precision" operations are twice more expensive than "low precision" one, the transformation does not worsen things. Right now this is only enabled for PPC, for which low and high precision operations have the same cost. Tell me if this is not acceptable.
I agree, I think this is okay as-is. However, please note this explicitly in a comment above the transform.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:7100
@@ +7099,3 @@
+      // fold (fadd x, (fpext (fma y, z, (fmul u, v)))
+      //   -> (fma (fpext y), (fpext z), (fma (fpext u), (fpext v), x))
+      if (N1.getOpcode() == ISD::FP_EXTEND) {
----------------
joker.eph wrote:
> The only change between this one and the one two level above is that the operand of the outer fadd are reversed. Is there an elegant way of not duplicating the code? A lambda maybe?
There is indeed a lot of very similar code here. If we could share this better in between patterns that would be much better.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:7214
@@ +7213,3 @@
+    // fold (fsub (fpext (fneg (fmul, x, y))), z)
+    //   -> (fneg (fma (fpext x), (fpext y), z))
+    if (N0.getOpcode() == ISD::FP_EXTEND) {
----------------
joker.eph wrote:
> ohsallen wrote:
> > joker.eph wrote:
> > > I come back with the canonicalization problem. Wouldn't it be more canonical to have:
> > >   fneg (fadd (fpext (fmul, x, y)), z)
> > > 
> > > In which case this combine would be useless.
> > This code is triggered when TLI.isFPExtFree() returns true. The FPEXT nodes are only generated for correctness, and are expected to be removed later by a MC pass. Thus the code generated here is more canonical in this context.
> I'm not sure you understood my comment. I'm talking about the input pattern, not the output pattern.
Sure, but don't miss the point. If the form being matched in non canonical, but still fires, then the correct fix is to add a canonicalization in DAGCombine, and not add logic for it here.

Assuming transforming from one to the other is always correct (even in the face of NaNs, etc.), then I agree that the fneg(fadd(...)) forms should be preferred.

http://reviews.llvm.org/D8050

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/