[PATCH] D22869: [LV] Generate both scalar and vector integer induction variables

Thu Jul 28 13:41:20 PDT 2016

mssimpso added a comment.

In https://reviews.llvm.org/D22869#499700, @mkuper wrote:

> Out of curiosity - what does the final generate code end up looking with and without this patch, for cases where we have both a scalar and a vector use?
>  It seems like it should be better, I'm wondering how much.


If you don't mind looking at the full code, I've pasted below what we generate for the new test case I've added to induction.ll (VF=2, UF=2, after instcombine).

Without this patch:

  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %vec.ind = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ %vec.ind.next, %vector.body ]
    %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
    %step.add = add <2 x i64> %vec.ind, <i64 2, i64 2>
    %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
    %3 = add <2 x i32> %broadcast.splat, %vec.ind2
    %4 = add <2 x i32> %broadcast.splat, %step.add3
    %5 = trunc <2 x i32> %3 to <2 x i16>
    %6 = trunc <2 x i32> %4 to <2 x i16>
    %7 = extractelement <2 x i64> %vec.ind, i32 0
    %8 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %7, i32 1
    %9 = extractelement <2 x i64> %vec.ind, i32 1
    %10 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %9, i32 1
    %11 = extractelement <2 x i64> %step.add, i32 0
    %12 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %11, i32 1
    %13 = extractelement <2 x i64> %step.add, i32 1
    %14 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %13, i32 1
    %15 = extractelement <2 x i16> %5, i32 0
    store i16 %15, i16* %8, align 2
    %16 = extractelement <2 x i16> %5, i32 1
    store i16 %16, i16* %10, align 2
    %17 = extractelement <2 x i16> %6, i32 0
    store i16 %17, i16* %12, align 2
    %18 = extractelement <2 x i16> %6, i32 1
    store i16 %18, i16* %14, align 2
    %index.next = add i64 %index, 4
    %vec.ind.next = add <2 x i64> %vec.ind, <i64 4, i64 4>
    %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>
    %19 = icmp eq i64 %index.next, %n.vec
    br i1 %19, label %middle.block, label %vector.body, !llvm.loop !0

With this patch:

  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %vec.ind2 = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next5, %vector.body ]
    %3 = or i64 %index, 1
    %4 = or i64 %index, 2
    %5 = or i64 %index, 3
    %step.add3 = add <2 x i32> %vec.ind2, <i32 2, i32 2>
    %6 = add <2 x i32> %broadcast.splat, %vec.ind2
    %7 = add <2 x i32> %broadcast.splat, %step.add3
    %8 = trunc <2 x i32> %6 to <2 x i16>
    %9 = trunc <2 x i32> %7 to <2 x i16>
    %10 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %index, i32 1
    %11 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %3, i32 1
    %12 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %4, i32 1
    %13 = getelementptr inbounds %pair.i16, %pair.i16* %p, i64 %5, i32 1
    %14 = extractelement <2 x i16> %8, i32 0
    store i16 %14, i16* %10, align 2
    %15 = extractelement <2 x i16> %8, i32 1
    store i16 %15, i16* %11, align 2
    %16 = extractelement <2 x i16> %9, i32 0
    store i16 %16, i16* %12, align 2
    %17 = extractelement <2 x i16> %9, i32 1
    store i16 %17, i16* %13, align 2
    %index.next = add i64 %index, 4
    %vec.ind.next5 = add <2 x i32> %vec.ind2, <i32 4, i32 4>
    %18 = icmp eq i64 %index.next, %n.vec
    br i1 %18, label %middle.block, label %vector.body, !llvm.loop !0




================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1973
@@ -1975,1 +1972,3 @@
 
+bool InnerLoopVectorizer::shouldScalarizeIntInduction(Instruction *IV) const {
+  if (Legal->isUniformAfterVectorization(IV))
----------------
mkuper wrote:
> Why "Int"? I mean, it may be called only from widenIntInduction now, but I don't really see a reason to bake it into the name, the logic is independent of the IV type.
> 
> Regardless, I find the use of "scalarize" sort of confusing. It evokes the notion of constructing a vector, and then extracting the scalars, which is the opposite of what we're doing. Maybe something like needsScalarInduction, and then a matching name for the variable below?
> (If you find the current terminology clearer, I won't insist on this, this could be just my own bias.)
I don't feel that strongly about the name, so I'm fine with this.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1977
@@ +1976,3 @@
+  for (auto *U : IV->users())
+    if (!OrigLoop->isLoopInvariant(U))
+      if (Legal->isScalarAfterVectorization(cast<Instruction>(U)))
----------------
mkuper wrote:
> Why do we need !isLoopInvariant(U)? If the use is loop invariant, shouldn't it necessarily be a scalar?
> 
> (Assuming we do need it, why two separate ifs, and not a && ?)
isScalarAfterVectorization returns true for values or instructions not in the loop. Using !isLoopInVariant ensures we're looking at an instruction in the loop that will be scalar. With the &&, we would get a clang-format line break, so I kept it as two if's because I thought it might be easier to read. Let me know if you prefer one over the other.


https://reviews.llvm.org/D22869