[PATCH] D90687: [LV] Clamp VF hint when unsafe

Wed Nov 11 05:33:00 PST 2020

c-rhodes added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5345
+    LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
+                      << " is unsafe, using maximum safe VF=" << MaxSafeVF
+                      << ".\n");
----------------
fhahn wrote:
> nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here.
> nit: it seems like the message here and of the remark slightly diverged. I think it would be worth using the same 'clamping' wording as in the remark here.

Good spot, I'll fix it before merging or update this patch once I get a better idea about the other suggestion you made.

================
Comment at: llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll:3
+
+; Make sure the unsafe user specified vectorization factor is clamped.
+
----------------
fhahn wrote:
> It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).
> It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).

Is this loop like what you had in mind?

```void foo(int *a, int *b, int N) {
  #pragma clang loop vectorize(enable) vectorize_width(64)
  for (int i=0; i<N; ++i) {
    a[i + 32] = a[i] / b[i];
  }
}```

When compiling with:

```./bin/clang -S -emit-llvm -o - ../dependence.c -O2 -mllvm -debug-only=loop-vectorize,loop-accesses -target aarch64-linux-gnu```

The user VF of 64 is unsafe so it's clamped to 32 and the vector loop of width 32 is more expensive (cost 13) than the scalar loop (cost 10), although the vectorization is forced so the VF=32 is still chosen.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90687/new/

https://reviews.llvm.org/D90687