[PATCH] D79976: [LV] Handle Fold-Tail of loops with vectorizarion factor (VF) equal to 1

Fri May 15 16:19:45 PDT 2020

anhtuyen marked 9 inline comments as done.
anhtuyen added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:444
                                    "trip.count.minus.1");
-    Value *VTCMO = Builder.CreateVectorSplat(State->VF, TCMO, "broadcast");
+    Value *VTCMO = State->VF == 1 ? TCMO
+                                  : Builder.CreateVectorSplat(State->VF, TCMO,
----------------
Ayal wrote:
> nit: using `auto VF = State->VF` will help shorten this to 2 lines.
I will update it shortly.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:820
     SmallVector<Constant *, 8> Indices;
     for (unsigned Lane = 0, VF = State.VF; Lane < VF; ++Lane)
       Indices.push_back(ConstantInt::get(STy, Part * VF + Lane));
----------------
Ayal wrote:
> nit: can move `VF = State.VF` to be reused above and below.
I will update it shortly.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:822
       Indices.push_back(ConstantInt::get(STy, Part * VF + Lane));
-    Constant *VStep = ConstantVector::get(Indices);
+    Constant *VStep = State.VF == 1 ? ConstantInt::get(STy, Part)
+                                    : ConstantVector::get(Indices);
----------------
Ayal wrote:
> nit: `ConstantInt::get(STy, Part)` was already pushed inside Indices, can retrieve it e.g. via `Indices.back()`.
I will update it shortly.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-scalar.ll:3
+; RUN: opt < %s  -loop-vectorize -force-vector-interleave=4 -S | FileCheck %s
+
+define void @foo() {
----------------
Ayal wrote:
> Would be good to explain what this test is for; e.g., that fold-tail produces correct scalar code when LV is only unrolling w/o vectorizing.
I will update it shortly.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-scalar.ll:15
+; CHECK-NEXT:    [[INDUCTION2:%.*]] = add i64 [[INDEX]], 2
+; CHECK-NEXT:    [[INDUCTION3:%.*]] = add i64 [[INDEX]], 3
+; CHECK:         [[INDEX_NEXT]] = add i64 [[INDEX]], 4
----------------
Ayal wrote:
> continue CHECK-NEXTing the scalar `icmp ule`'s
Good catch! I will add them.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-scalar.ll:17
+; CHECK:         [[INDEX_NEXT]] = add i64 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
----------------
Ayal wrote:
> bmahjour wrote:
> > How is it that the original loop executes 15 iterations, but the vector loop iterates 16? It seems the minimum iteration count check branch at the top should branch to the scalar loop instead of vector.ph.
> (Thanks for asking, reminded to check above and below that fold-tail emits the desired scalar `icmp ule`'s, which are the focus of this patch.)
> 
> Fold-tail is responsible for rounding-up the trip count from 15 to 16, see https://reviews.llvm.org/D50480.
> Regarding minimum iteration count check branch, fold-tail is also responsible in general for branching directly to vector.ph w/o an "if (trip-count < VF*UF)", which in this case is known to be false anyhow.
Ayal @Ayal , thank you very much for your help to clarify. My guess is that Bardia @bmahjour might have been concerned whether by any chance the effective addition of the 16th iteration would affect the correctness of the generated code. Because I have neither evidence nor counterevidence to address his concern, if you can shed some light on it when you have some time, that will be great.  

Back to this patch but given the fact that the value of the trip-count is not the main focus hereof, I take the liberty of omitting its value from the checks. If that is not acceptable to any of us here, please let me know. 

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-vector.ll:3
+; RUN: opt < %s  -loop-vectorize -force-vector-interleave=4 -S | FileCheck %s
+
+define void @foo() !prof !12 {
----------------
Ayal wrote:
> Better have both tests in the same file, briefly explain what this tests - possibly using an informative function name rather than "foo".
I will update it shortly.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-vector.ll:23
+; CHECK-NEXT:    [[VEC_IV5:%.*]] = add i64 [[INDEX]], 2
+; CHECK-NEXT:    [[VEC_IV6:%.*]] = add i64 [[INDEX]], 3
+; CHECK:         [[INDEX_NEXT]] = add i64 [[INDEX]], 4
----------------
Ayal wrote:
> continue CHECK-NEXTing the scalar icmp ule's
I will update it shortly.

================
Comment at: llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1-vector.ll:37
+; CHECK-NEXT:    [[PTR]] = getelementptr inbounds double, double* [[ADDR]], i64 1
+; CHECK-NEXT:    [[COND:%.*]] = icmp eq double* [[PTR]], undef
+; CHECK-NEXT:    br i1 [[COND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop !15
----------------
bmahjour wrote:
> Can the `undef` values be replaced with some valid values (such as an incoming parameter)?
I will update it shortly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79976/new/

https://reviews.llvm.org/D79976