[PATCH] D97861: [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector

Fri Mar 5 04:50:05 PST 2021

david-arm marked 2 inline comments as done.
david-arm added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2265
+
+  // Add on StartIdx
+  Value *StartIdxSplat = Builder.CreateVectorSplat(
----------------
ctetreau wrote:
> ctetreau wrote:
> > It seems like this does the same thing as the original version. However, I don't think this function does what it claims to do.
> > 
> > The docs say that this computes `Val + <StartIdx, StartIdx + Step, StartIdx + 2*Step, ...>`
> > 
> > But what this function does is:
> > ```
> > StartIdxSplat = <StartIdx, StartIdx, ...>
> > InitVec = StartIdxSplat + <0, 1, ...>
> > Step = <InputStep, InputStep, ...>
> > result = Val + (InitVec * Step)
> > ```
> > 
> > If the input step is 1, this seems to do the right thing:
> > 
> > ```
> > StartIdx = 2 // some non-1 start for illustrative purposes
> > Val = <a, b, c, d>
> > InputStep = 1
> > 
> > StartIdxSplat = <2, 2, 2, 2>
> > InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
> > Step = <1, 1, 1, 1>
> > result = <a, b, c, d> + (<2, 3, 4, 5> * <1, 1, 1, 1>) = <a, b, c, d> + <2, 3, 4, 5>
> > ```
> > 
> > But if we try a larger input step:
> > 
> > ```
> > StartIdx = 2 // some non-1 start for illustrative purposes
> > Val = <a, b, c, d>
> > InputStep = 2
> > 
> > StartIdxSplat = <2, 2, 2, 2>
> > InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
> > Step = <2, 2, 2, 2>
> > result = <a, b, c, d> + (<2, 3, 4, 5> * <2, 2, 2, 2>) = <a, b, c, d> + <4, 6, 8, 10>
> > ```
> > 
> > The label on the tin says that the first element of the RHS vector should be equal to `StartIdx`, which it clearly isn't. I haven't scrutinized the floating point codepath, but I assume it has a similar issue.
> > 
> > I believe we need to multiply the step vector by the step before adding the start value.
> > 
> > I feel like we should do something about this. Either assert that the step is 1 and add a fixme, or just fix the function.
> I suppose another option is to document what the function actually does in the header
Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we actually have tests that defend this behaviour. For example, see function `@non_primary_iv_loop_inv_trunc` in llvm/test/Transforms/LoopVectorize/induction-step.ll, which has CHECK lines like this:

  ; CHECK:         [[TMP3:%.*]] = trunc i64 %step to i32
  ; CHECK-NEXT:    [[DOTSPLATINSERT5:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
  ; CHECK-NEXT:    [[DOTSPLAT6:%.*]] = shufflevector <8 x i32> [[DOTSPLATINSERT5]], <8 x i32> poison, <8 x i32> zeroinitializer
  ; CHECK-NEXT:    [[TMP4:%.*]] = mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[DOTSPLAT6]]

In this case I've tried to update the documentation to reflect this behaviour.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97861/new/

https://reviews.llvm.org/D97861