[PATCH] D152281: [Transforms][LICM] Add the ability to undo unprofitable reassociation

Tue Jul 25 09:27:03 PDT 2023

nikic added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LICM.cpp:2708
+      return false;
+    if (!match(VOp, m_FAdd(m_BinOp(Op), m_BinOp(OpNext)))) {
+      Op = VOp;
----------------
pawosm01 wrote:
> nikic wrote:
> > Do we need to require that the fadd is reassoc as well?
> Yes, this match specifically requires that the expression is of the form that is described in the comment written above this function: `((A1 * B1) + (A2 * B2) + ...)`. I wrote this function to solve the problem at hand, to which I know that my solution provides noticeable performance benefit. Trying to make it more general would require a whole lot more research and performance impact analysis that would go far beyond what I wanted to fix. I'm offering this solution which I based on the other hoist... functions in this file, I believe if one finds a reason for extending this (to handle their own performance issue at hand), they could do that basing on what I did propose.
> 
Let me rephrase this: Isn't this missing a reassoc check on the fadd? We check the root fmul above, and we check the inner fmuls below, but we don't seem to check anything about the fadd.

================
Comment at: llvm/lib/Transforms/Scalar/LICM.cpp:2742
+    IRBuilder<> Builder(Preheader->getTerminator());
+    U->set(Builder.CreateFMulFMF(U->get(), Factor, &I, "factor.op.fmul"));
+  }
----------------
pawosm01 wrote:
> nikic wrote:
> > This copies the FMF flags from the top-level fmul, but is that correct? We have multiple fadds and fmuls involved in the transform, why is it safe to use the flags from that one in particular?
> > 
> > (The test coverage for this is not great, because you just use "fast" everywhere.)
> Yes I assume it's correct. What is happening here is that the fmul instruction from which the flag is taken is effectively being copied to every subexpression in the parenthesis, it is reasonable to copy the fast flag too. As checking for associations being allowed is implicitly checking the fast flag, nothing harmful is happening here.
> 
The transform only requires the reassoc flag on each instruction, while this is also copying all other FMF flags. It's not obvious to me that this is correct.

One could probably make an argument along the lines of "if the final fmul is nnan, then that implies that the fadd chain result is not nan. As the result would be nan if any operand were nan, we can propagate the nnan flag up the chain". This sounds plausible. But I'm not sure the same logic holds up for ninf and other FMF flags.

================
Comment at: llvm/lib/Transforms/Scalar/LICM.cpp:2732
+    if ((L.isLoopInvariant(Ops[0])) || (L.isLoopInvariant(Ops[1])))
+      candidates++;
+  }
----------------
huntergr wrote:
> pawosm01 wrote:
> > huntergr wrote:
> > > I don't think we need a count here; just a boolean 'CandidateFound' would suffice.
> > It isn't as easy as it seems. As it turns, this transformation shouldn't be applied for every situation where at least one candidate is found.
> > 
> > There are two problems here:
> > 
> > 1. a simple expression like inv1 * var1 * inv2 will be found as a candidate and transformed into var1 * (inv1 * inv2) where (inv1 * inv2) part will be hoisted. But this expression itself can be a part of a longer expression, e.g. inv1 * var1 * inv2 + inv1 * var2 * inv2 + inv1 * var3 * inv2 which before the introduction of my change could easily be transformed into (var1 + var2 + var3) * (inv1 * inv2) with (inv1 * inv2) hoisted. The transformation I'm proposing should rather focus on the expressions described in the comment block placed before this function which usually  require undoing the actions of some previous pass (namely, the Reasociate pass), and leave simpler cases alone and make the existing test cases supporting them just pass without any awfully messy change. Hence, I've added `AnyAdds` variable to make sure the intended expression patterns are being found. The previous oversimlified condition eliminated such cases by a sheer coincidence.
> > 
> > 2. with the expression long enough and small enough trip count there is a possibility that the transformation leading to the hoisting can cause performance degradation. If most of the operations in a particularly long expression involves non-invariants, there will be more multiplications introduced than operations hoisted. By changing the type of the `Candidates` counter to a signed integer and decrementing it whenever there is no invariant in the operation (hence no candidate for hoisting), the applicability of this transformation can be limited to the situations where the numbers of non-invariant operations and hoisting candidates are at least equal. A crude estimate and naïve one, I can admit that, but anything better would overcomplicate the logic here, e.g. by introducing instruction cost estimation and trip count estimation, the cost of doing it can easily outgrow any benefits.
> > 
> Yes, I see that if you have multiplies with no loop-invariant operands in the chain of adds then you need to add more multiplies inside the loop, which seems to defeat the purpose of the optimization. So perhaps we should avoid performing this transformation if there is such a multiply in the chain.
> 
> It might be possible to perform it with only 1 such multiply since you'll be reordering the operations and may reduce the latency, but that would require benchmarking to confirm and is dependent on the workload and target, but I think the safe side is to bail out.
I didn't quite get your argument in 1. for why it's a bad idea to transform inv1 * var1 * inv2 into var1 * (inv1 * inv2). Why is there a problem if it's part of a longer expression? Wouldn't we still get `(var1 + var2 + var3) * (inv1 * inv2)` as the end result if `inv1 * inv2` is hoisted out of each part and then CSEd?

As far as I can tell, the transform should be profitable regardless of whether there is an fadd or not.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152281/new/

https://reviews.llvm.org/D152281