[PATCH] D152281: [Transforms][LICM] Add the ability to undo unprofitable reassociation

Thu Jul 20 01:39:16 PDT 2023

huntergr added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LICM.cpp:2704-2706
+  for (BinaryOperator *Op = nullptr, *OpNext = nullptr,
+                      *VOp = dyn_cast<BinaryOperator>(VariantOp);
+       VOp; VOp = OpNext) {
----------------
It may be worthwhile considering an upper limit on the number of multiplies you support -- depending on the trip count of the loop, you may actually be introducing extra work for no gain (especially for a target with wide vectors that may complete everything in one vector iteration).

================
Comment at: llvm/lib/Transforms/Scalar/LICM.cpp:2732
+    if ((L.isLoopInvariant(Ops[0])) || (L.isLoopInvariant(Ops[1])))
+      candidates++;
+  }
----------------
pawosm01 wrote:
> huntergr wrote:
> > I don't think we need a count here; just a boolean 'CandidateFound' would suffice.
> It isn't as easy as it seems. As it turns, this transformation shouldn't be applied for every situation where at least one candidate is found.
> 
> There are two problems here:
> 
> 1. a simple expression like inv1 * var1 * inv2 will be found as a candidate and transformed into var1 * (inv1 * inv2) where (inv1 * inv2) part will be hoisted. But this expression itself can be a part of a longer expression, e.g. inv1 * var1 * inv2 + inv1 * var2 * inv2 + inv1 * var3 * inv2 which before the introduction of my change could easily be transformed into (var1 + var2 + var3) * (inv1 * inv2) with (inv1 * inv2) hoisted. The transformation I'm proposing should rather focus on the expressions described in the comment block placed before this function which usually  require undoing the actions of some previous pass (namely, the Reasociate pass), and leave simpler cases alone and make the existing test cases supporting them just pass without any awfully messy change. Hence, I've added `AnyAdds` variable to make sure the intended expression patterns are being found. The previous oversimlified condition eliminated such cases by a sheer coincidence.
> 
> 2. with the expression long enough and small enough trip count there is a possibility that the transformation leading to the hoisting can cause performance degradation. If most of the operations in a particularly long expression involves non-invariants, there will be more multiplications introduced than operations hoisted. By changing the type of the `Candidates` counter to a signed integer and decrementing it whenever there is no invariant in the operation (hence no candidate for hoisting), the applicability of this transformation can be limited to the situations where the numbers of non-invariant operations and hoisting candidates are at least equal. A crude estimate and naïve one, I can admit that, but anything better would overcomplicate the logic here, e.g. by introducing instruction cost estimation and trip count estimation, the cost of doing it can easily outgrow any benefits.
> 
Yes, I see that if you have multiplies with no loop-invariant operands in the chain of adds then you need to add more multiplies inside the loop, which seems to defeat the purpose of the optimization. So perhaps we should avoid performing this transformation if there is such a multiply in the chain.

It might be possible to perform it with only 1 such multiply since you'll be reordering the operations and may reduce the latency, but that would require benchmarking to confirm and is dependent on the workload and target, but I think the safe side is to bail out.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152281/new/

https://reviews.llvm.org/D152281