[PATCH] D80870: [LV] Make sure smallest/widest type sizes are powers-of-2.

Mon Jun 1 12:59:54 PDT 2020

fhahn marked an inline comment as done.
fhahn added a subscriber: bjope.
fhahn added a comment.

In D80870#2065079 <https://reviews.llvm.org/D80870#2065079>, @Ayal wrote:

> Also added a couple of unrelated comments.

I'll take a look at those in a bit, when I have more time.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5011

   unsigned MaxVF = UserVF ? UserVF : computeFeasibleMaxVF(TC);
   assert((UserVF || isPowerOf2_32(MaxVF)) && "MaxVF must be a power of 2");
----------------
bjope wrote:
> Ayal wrote:
> > The original motivation to assert that MaxVF is a power of 2, in D80491, was for concluding that any chosen VF will divide a constant trip count.
> > Perhaps a better fix than D80491 would be (have been ..., as noted in the summary) to round MaxVF down to a power of 2 here, rather than forcing computeFeasibleMaxVF() and/or getSmallestAndWidestTypes() to return powers of 2.
> Rounding down MaxVF returned by computeFeasibleMaxVF (maybe at the end of that function rather than here) would probably work better for our OOT target (with 160-bit register and i40 types, so neither the Smallest/WidestType nor the TTI.getRegisterBitWidth(true) returns a power-of-2 value in that situation).
> Perhaps a better fix than D80491 would be (have been ..., as noted in the summary) to round MaxVF down to a power of 2 here, rather than forcing computeFeasibleMaxVF() and/or getSmallestAndWidestTypes() to return powers of 2.

`computeFeasibleMaxVF` already has a comment indicating that the MaxVF should be a power-of-2 I think. I updated the patch to move the rounding down to computeFeasibleMaxVF, as other code might also implicitly rely on MaxVF being a power-of-2. 

It might be good to first ensure MaxVF to be a power of 2 in `computeFeasibleMaxVF` in the short term and then check if the other uses work as expected with non-power-of-2's subsequently, if required/desired.

I think on most in-tree targets it is unlikely to have much impact, but it would be interesting to hear if the out-of-tree target @bjope mentioned actually supports non-power-of-2 vectorization factors (e.g. vector add of `5 x i40`)? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80870/new/

https://reviews.llvm.org/D80870