[PATCH] D90687: [LV] Ignore VF hint when unsafe

Thu Nov 5 07:53:26 PST 2020

sdesmalen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5237

+  auto MaxVF = computeFeasibleMaxVF(TC);
+  if (UserVF) {
----------------
sdesmalen wrote:
> c-rhodes wrote:
> > dmgreen wrote:
> > > Clamping sounds good.
> > > 
> > > But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers.
> > > 
> > > It's perhaps easier to show with examples. This code should produce the same thing (the same width vectors) with and without this patch, I believe, as there is nothing "unsafe" about vectorizing at a higher bitwidth than the vector registers:
> > > https://godbolt.org/z/ePdv3K
> > > 
> > > (I am trying to not use the term "legal", as it has too many meanings. There is a difference between "legal to vectorize" (as the safety constraints in LoopVectorizationLegality) and "legal vector widths" which just means that the llvm-ir vector can be lowered to a single vector register and I don't think should be very relevant here).
> > > But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers.
> > 
> > So only considering the VF computed by LAA for dependencies rather the backend register widths, I think that makes sense.
> > 
> > Whilst looking into this I discovered the example I gave in the commit message doesn't actually vectorize when only specifying `-force-vector-width=4` and no loop hint:
> > ```
> > void foo(int *a, int *b, int N) {
> >   for (int i=0; i<N; ++i) {
> >     a[i + 2] = a[i] + b[i];
> >   }
> > }
> > 
> > clang -S -emit-llvm -o - ../dependence.c -O3 -Rpass-missed=loop-vectorize -mllvm -force-vector-width=4
> > ../dependence.c:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize]
> >   for (int i=0; i<N; ++i) {
> >   ^
> > ```
> > 
> > It's a bit of a mess how UserVF is handled, it seems LAA only knows about the UserVF specified by `-force-vector-width`. With the pragma LAA is operating on VF=2 and LV on VF=4. What's also interesting is the loop metadata takes precedence over the flag since in `LoopVectorizationLegality` the vector width is initialized with the flag then populated with loop metadata, so the VF according to LAA would come from the flag and in LV the loop hint, assuming both were specified by the user that is.
> > 
> > I've updated the patch such that `computeFeasibleMaxVF` now takes `UserVF` and clamps this to VF based on `Legal->getMaxSafeRegisterWidth();` if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of `UserVF` needs refactoring. Now I know `-force-vector-width` won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts?
> > 
> > 
> That's a good point @dmgreen, `MaxSafeRegisterWidth` and corresponding `getMaxSafeRegisterWidth` in LoopVectorize.cpp and LoopAccessAnalysis.cpp are actually misnomers because it isn't the maximum safe register width that is calculated, but rather the maximum safe vector bitwidth.
> 
> Without this patch, this example is vectorized with VF=8 when compiling for Neon (128bit vectors):
>   void foo(int *a, int *b, int c, int N) {
>     #pragma clang loop vectorize(enable) vectorize_width(8)
>     for (int i=0; i<N; ++i) {
>       a[i + 16] = a[i] + b[i];
>     }
>   }
> Where with this patch, it is now vectorized with VF=4. It seems like the limitation with regards to the actual physical vector register can and should be removed.
> the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts?
To me they read as slightly different things, but it's good to clear their semantics up.

I interpret:
*  `-force-vector-width` as "//vectorize it with this width and this width only, and fail if not legal//".
* The `LoopHint` as "//try to vectorize with this width but if not legal, feel free to ignore the hint and pick a different width//".

At least, that's what the implementation does today and I assumed that was deliberate. Maybe someone else can clarify that?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90687/new/

https://reviews.llvm.org/D90687