[PATCH] D80870: [LV] Make sure smallest/widest type sizes are powers-of-2.

Wed Jun 3 06:33:13 PDT 2020

bjope added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5011

   unsigned MaxVF = UserVF ? UserVF : computeFeasibleMaxVF(TC);
   assert((UserVF || isPowerOf2_32(MaxVF)) && "MaxVF must be a power of 2");
----------------
fhahn wrote:
> bjope wrote:
> > Ayal wrote:
> > > The original motivation to assert that MaxVF is a power of 2, in D80491, was for concluding that any chosen VF will divide a constant trip count.
> > > Perhaps a better fix than D80491 would be (have been ..., as noted in the summary) to round MaxVF down to a power of 2 here, rather than forcing computeFeasibleMaxVF() and/or getSmallestAndWidestTypes() to return powers of 2.
> > Rounding down MaxVF returned by computeFeasibleMaxVF (maybe at the end of that function rather than here) would probably work better for our OOT target (with 160-bit register and i40 types, so neither the Smallest/WidestType nor the TTI.getRegisterBitWidth(true) returns a power-of-2 value in that situation).
> > Perhaps a better fix than D80491 would be (have been ..., as noted in the summary) to round MaxVF down to a power of 2 here, rather than forcing computeFeasibleMaxVF() and/or getSmallestAndWidestTypes() to return powers of 2.
> 
> `computeFeasibleMaxVF` already has a comment indicating that the MaxVF should be a power-of-2 I think. I updated the patch to move the rounding down to computeFeasibleMaxVF, as other code might also implicitly rely on MaxVF being a power-of-2. 
> 
> It might be good to first ensure MaxVF to be a power of 2 in `computeFeasibleMaxVF` in the short term and then check if the other uses work as expected with non-power-of-2's subsequently, if required/desired.
> 
> I think on most in-tree targets it is unlikely to have much impact, but it would be interesting to hear if the out-of-tree target @bjope mentioned actually supports non-power-of-2 vectorization factors (e.g. vector add of `5 x i40`)? 
The VF is power-of-2 for the targets we support (but we got some registers and element sizes that aren't power-of-2).

And this patch (as it landed) seem to solve the problems we had with D80491. Thanks!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80870/new/

https://reviews.llvm.org/D80870