[PATCH] D92178: [InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost

Thu Dec 3 05:23:32 PST 2020

ctetreau added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5441
     VectorizationCostTy C = expectedCost(ElementCount::getFixed(i));
-    std::pair<unsigned, unsigned> VectorCost = {C.first, i};
+    std::pair<InstructionCost, unsigned> VectorCost = {C.first, i};
     LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i
----------------
david-arm wrote:
> ctetreau wrote:
> > david-arm wrote:
> > > ctetreau wrote:
> > > > I think you can get rid of these pairs if you just check validity before assigning it.
> > > OK. I guess the problem is that we could start off with an invalid cost (ScalarCost) and get another Invalid cost in the loop too. I think we could get rid of the pairs, but it just makes the main loop more complicated that's all. What I could do is add this in the main loop:
> > > 
> > >   if (!C.first.isValid())
> > >     continue
> > >   if (!ScalarCost.isValid())
> > >     MinCost = C.first;
> > >   // Both costs are now valid.
> > >   // ... use MinCost and C somehow ...
> > > 
> > > The other thing I was trying to do is move from using a float to represent the MinCost in the loop, to using an InstructionCost (integer based) instead. If I get rid of the pairs then it makes more sense to revert to using a float for MinCost I think, since then division (and hence a fractional cost) is involved.
> > > 
> > > I can try adding control flow to the loop like above?
> > So I messed around with your patch, and came up with this loop body:
> > 
> > ```
> > for (unsigned i = 2; i <= MaxVF.getFixedValue(); i *= 2) {
> >     // Notice that the vector loop needs to be executed less times, so
> >     // we need to divide the cost of the vector loops by the width of
> >     // the vector elements.
> >     VectorizationCostTy C = expectedCost(ElementCount::getFixed(i));
> > 
> >     Optional<float> VectorCost =
> >         C.first.getValue().map([i](InstructionCost::CostType Cost) {
> >           return static_cast<float>(Cost) / i;
> >         });
> > 
> >     // removed debug output noise
> > 
> >     if (auto MinCostVal = MinCost.getValue())
> >       if (VectorCost && *VectorCost < *MinCostVal) {
> >         Width = i;
> >         // requires ctor that takes an Optional, and ctors that convert from number like things
> >         MinCost = VectorCost;
> >       }
> >   }
> > ```
> > 
> > I haven't ran the tests, but this compiles and I think it should work.
> Hi @ctetreau, your suggestions for adding a conversion constructor/getter in general seem sensible to me, however I just have one question. In that example above it looks like you're rounding a float (VectorCost) to an integer, which is a change in behaviour I think? Currently the examples I'm thinking of are where we compare a cost of 11 for a width of 2, with a cost of 21 for a width of 4, i.e.
>   float(11)/2 = 5.5
>   float(21)/4 = 5.25
> Previously we'd choose a width of 4 because 5.25 is less than 5.5. I think with your proposal we'd keep a width of 2 because 5.5 gets rounded down to 5. Unless we make MinCost use floats too, in which case the need for conversion goes away I think?
Yeah, looks like I messed up. If you have MinCost be an Optional<float>, it seems like it should be fine. The main idea is that this whole function should be working with Optional<float> instead of InstructionCost. We can convert all InstructionCost objects to Optional<float> right away, do floating point math with them, and then convert back to unsigned for the return.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92178/new/

https://reviews.llvm.org/D92178