[PATCH] D90687: [LV] Clamp VF hint when unsafe

Tue Nov 24 11:37:17 PST 2020

fhahn added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll:3
+
+; Make sure the unsafe user specified vectorization factor is clamped.
+
----------------
c-rhodes wrote:
> c-rhodes wrote:
> > fhahn wrote:
> > > It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).
> > > It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).
> > 
> > Is this loop like what you had in mind?
> > 
> > ```void foo(int *a, int *b, int N) {
> >   #pragma clang loop vectorize(enable) vectorize_width(64)
> >   for (int i=0; i<N; ++i) {
> >     a[i + 32] = a[i] / b[i];
> >   }
> > }```
> > 
> > When compiling with:
> > 
> > ```./bin/clang -S -emit-llvm -o - ../dependence.c -O2 -mllvm -debug-only=loop-vectorize,loop-accesses -target aarch64-linux-gnu```
> > 
> > The user VF of 64 is unsafe so it's clamped to 32 and the vector loop of width 32 is more expensive (cost 13) than the scalar loop (cost 10), although the vectorization is forced so the VF=32 is still chosen.
> > It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).
> 
> I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it?
>> It might also be interesting to add a test cases where the user provided VF is large (say 64), the max legal width is something like 32 and the profitable width selected by the cost model is something smaller (might be easier if this is a target-specific test for a specific architecture ).
>I might be missing something but I don't think this test will add any value, although I'm not comfortable landing this patch until this is resolved. I suggest landing this patch as is, @fhahn unless you feel strongly about it?

Agreed, such a test doesn't really add much. What I was suggesting was one where the cost model does pick a different VF than the maximum safe one. This is the case that should be handled differently with the current version compared to the first version.

I was thinking about a test like the one below. It requests VF = 32, but only 16 is safe. When built with  `opt -loop-vectorize -mtriple=arm64-apple-iphoneos`, the cost model should pick VF = 2 instead of the higher alternatives.

```
define void @test(i64* nocapture %a, i64* nocapture readonly %b) {
entry:
  br label %loop.header

loop.header:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
  %arrayidx = getelementptr inbounds i64, i64* %a, i64 %iv
  %0 = load i64, i64* %arrayidx, align 4
  %arrayidx2 = getelementptr inbounds i64, i64* %b, i64 %iv
  %1 = load i64, i64* %arrayidx2, align 4
  %add = add nsw i64 %1, %0
  %2 = add nuw nsw i64 %iv, 16
  %arrayidx5 = getelementptr inbounds i64, i64* %a, i64 %2
  %c = icmp eq i64 %1, 120
  br i1 %c, label %then, label %latch

then:
  store i64 %add, i64* %arrayidx5, align 4
  br label %latch

latch:
  %iv.next = add nuw nsw i64 %iv, 1
  %exitcond.not = icmp eq i64 %iv.next, 1024
  br i1 %exitcond.not, label %exit, label %loop.header, !llvm.loop !0

exit:
  ret void
}

!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.vectorize.width", i64 32}
!2 = !{!"llvm.loop.vectorize.enable", i1 true}
```

================
Comment at: llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll:20
+entry:
+  %cmp12 = icmp sgt i32 %N, 0
+  br i1 %cmp12, label %for.body.preheader, label %for.cond.cleanup
----------------
nit: Is this required? LV will add a minimum iteration check anyways, so this could be dropped to make the IR more compact? Alternatively we could also use a constant trip count.

================
Comment at: llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll:24
+for.body.preheader:                               ; preds = %entry
+  %wide.trip.count = zext i32 %N to i64
+  br label %for.body
----------------
nit: Is this required? We could just change `%N` to be a `i64` to make the IR more compact.

================
Comment at: llvm/test/Transforms/LoopVectorize/unsafe-vf-remark.ll:31
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
----------------
nit: can we drop the `indvars.` prefix to make the IR slightly more readable?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90687/new/

https://reviews.llvm.org/D90687