[PATCH] D125987: [SLP] Account for cost of removing FMA opportunities

Wed Jun 15 06:37:59 PDT 2022

wjschmidt added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:9668

+void SLPVectorizerPass::adjustForFMAs(InstructionCost &Cost,
+                                      ArrayRef<Value *> &VL) {
----------------
ABataev wrote:
> wjschmidt wrote:
> > ABataev wrote:
> > > wjschmidt wrote:
> > > > wjschmidt wrote:
> > > > > ABataev wrote:
> > > > > > wjschmidt wrote:
> > > > > > > ABataev wrote:
> > > > > > > > wjschmidt wrote:
> > > > > > > > > wjschmidt wrote:
> > > > > > > > > > RKSimon wrote:
> > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > Can this be hidden in the TTI completely anyhow? To avoid the extra calls of the adjustForFMAs completely in SLP.
> > > > > > > > > > > +1 - the only problem if that this will probably end up having to be added on a target-by-target basis in getArithmeticInstrCost - hopefully we can provide a helper wrapper so its only a one-liner in each target.
> > > > > > > > > > Hi -- sorry, I'll be out of town for the next week, so can't look at this right away.  In the meantime, can you please give me a little bit more idea exactly what you're looking for?  I've expressed my concerns about making direct changes to getArithmeticInstrCost for this in the comments for this revision -- can you please address those?  It's not at all clear to me what you're looking for other than "hide it from SLP somehow."
> > > > > > > > > I took a run at prototyping this yesterday for X86 TTI.  I put a shim in X86TTIImpl::getArithmeticInstrCost() to look for cases of FAdd and FSub that we expect to be folded into FMAs, and reduce their cost appropriately.  This was sufficient to prevent the horizontal reduction from happening, but as I expected, this doesn't help us with the tryToVectorizeList() case, where the cost modeling is only looking at the multiplies and not the add/sub fed by them.  (I.e., the second variant in the test case isn't fixed.)  I don't see any way to avoid explicit FMA checking in the SLP vectorizer to solve this, as explained in my summary for this revision.
> > > > > > > > > 
> > > > > > > > > Given this result, I'd like to proceed with the patch provided here.  It would be pleasant if we could hide the issue entirely from SLP, but it doesn't seem practical.  Thanks!
> > > > > > > > Could you provide more details why this does not help with tryToVectorizeList?
> > > > > > > Because tryToVectorizeList doesn't see the adds at all.  The root of the tree is the multiplies.  Each of those multiplies happens to feed an add that's outside the vectorizable tree.  There's never any call to getArithmeticInstrCost for the adds, so modifying the cost there is of no help.
> > > > > > Can we look through the users of these multiplies?
> > > > > That's exactly what this patch is doing.
> > > > To be clear, the current patch does fix the tryToVectorizeList case.  It just doesn't hide that aspect of things in TTI.
> > > I mean, can we do it in TTI?
> > The problem is that you have to pick one way or the other to look at things if you do it in TTI.  You either have to say, the cost of an add is free if it's fed by a multiply, or the cost of a multiply is free if it feeds into an add.  If you do both of these things, you end up undercounting the cost.  I don't see a practical way to do this in TTI that solves both problems in SLP.  One is looking at things from a multiply point of view and the other from the add point of view.
> Yes, agree. Can we try to have free muls if they are used by adds?Do you foresee any problems with this approach?
I implemented a prototype that gave us free adds if they use multiplies.  I didn't see any regressions from this.  I haven't tried it the other way around (free multiplies if used only by a single add each).  If we did that, my guess is we probably still wouldn't see regressions.  However, we also wouldn't see much benefit, because now the other case (the horizontal reduction case) will no longer be fixed by the TTI solution.  Either way we end up having to have code in SLP that special-cases FMAs somewhere.

My feeling is that it's sensible for TTI to provide the interface that calculates the savings of an FMA, but the burden needs to be on the users of TTI (SLP for now, but perhaps others) to determine when to use it.  I really do understand why it would be preferable to hide everything in the TTI layer, so nobody ever has to worry about it again, but because users can come at the problem from two directions, it's hard for a simple interface like getArithmeticInstrCost to hide the complexity.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125987/new/

https://reviews.llvm.org/D125987