[PATCH] D90687: [LV] Ignore VF hint when unsafe
Cullen Rhodes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 5 07:40:39 PST 2020
c-rhodes added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5237
+ auto MaxVF = computeFeasibleMaxVF(TC);
+ if (UserVF) {
----------------
dmgreen wrote:
> Clamping sounds good.
>
> But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers.
>
> It's perhaps easier to show with examples. This code should produce the same thing (the same width vectors) with and without this patch, I believe, as there is nothing "unsafe" about vectorizing at a higher bitwidth than the vector registers:
> https://godbolt.org/z/ePdv3K
>
> (I am trying to not use the term "legal", as it has too many meanings. There is a difference between "legal to vectorize" (as the safety constraints in LoopVectorizationLegality) and "legal vector widths" which just means that the llvm-ir vector can be lowered to a single vector register and I don't think should be very relevant here).
> But I think that there is a difference between the "maximum safe vector width" and "the maximum register bitwidth". The maximum safe distance should be a legality constraint that doesn't depend on the size of the backend registers.
So only considering the VF computed by LAA for dependencies rather the backend register widths, I think that makes sense.
Whilst looking into this I discovered the example I gave in the commit message doesn't actually vectorize when only specifying `-force-vector-width=4` and no loop hint:
```
void foo(int *a, int *b, int N) {
for (int i=0; i<N; ++i) {
a[i + 2] = a[i] + b[i];
}
}
clang -S -emit-llvm -o - ../dependence.c -O3 -Rpass-missed=loop-vectorize -mllvm -force-vector-width=4
../dependence.c:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize]
for (int i=0; i<N; ++i) {
^
```
It's a bit of a mess how UserVF is handled, it seems LAA only knows about the UserVF specified by `-force-vector-width`. With the pragma LAA is operating on VF=2 and LV on VF=4. What's also interesting is the loop metadata takes precedence over the flag since in `LoopVectorizationLegality` the vector width is initialized with the flag then populated with loop metadata, so the VF according to LAA would come from the flag and in LV the loop hint, assuming both were specified by the user that is.
I've updated the patch such that `computeFeasibleMaxVF` now takes `UserVF` and clamps this to VF based on `Legal->getMaxSafeRegisterWidth();` if it exceeds it. This simplifies the patch a fair bit since no existing tests change, although I think the handling of `UserVF` needs refactoring. Now I know `-force-vector-width` won't apply unless it's safe, the loop hint and flag feel semantically equivalent but I'm not sure if there's anything I'm missing. Any thoughts?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D90687/new/
https://reviews.llvm.org/D90687
More information about the llvm-commits
mailing list