[PATCH] D97861: [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector

Fri Mar 5 12:26:42 PST 2021

ctetreau accepted this revision.
ctetreau added a comment.
This revision is now accepted and ready to land.

lgtm

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2265
+
+  // Add on StartIdx
+  Value *StartIdxSplat = Builder.CreateVectorSplat(
----------------
david-arm wrote:
> ctetreau wrote:
> > ctetreau wrote:
> > > It seems like this does the same thing as the original version. However, I don't think this function does what it claims to do.
> > > 
> > > The docs say that this computes `Val + <StartIdx, StartIdx + Step, StartIdx + 2*Step, ...>`
> > > 
> > > But what this function does is:
> > > ```
> > > StartIdxSplat = <StartIdx, StartIdx, ...>
> > > InitVec = StartIdxSplat + <0, 1, ...>
> > > Step = <InputStep, InputStep, ...>
> > > result = Val + (InitVec * Step)
> > > ```
> > > 
> > > If the input step is 1, this seems to do the right thing:
> > > 
> > > ```
> > > StartIdx = 2 // some non-1 start for illustrative purposes
> > > Val = <a, b, c, d>
> > > InputStep = 1
> > > 
> > > StartIdxSplat = <2, 2, 2, 2>
> > > InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
> > > Step = <1, 1, 1, 1>
> > > result = <a, b, c, d> + (<2, 3, 4, 5> * <1, 1, 1, 1>) = <a, b, c, d> + <2, 3, 4, 5>
> > > ```
> > > 
> > > But if we try a larger input step:
> > > 
> > > ```
> > > StartIdx = 2 // some non-1 start for illustrative purposes
> > > Val = <a, b, c, d>
> > > InputStep = 2
> > > 
> > > StartIdxSplat = <2, 2, 2, 2>
> > > InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
> > > Step = <2, 2, 2, 2>
> > > result = <a, b, c, d> + (<2, 3, 4, 5> * <2, 2, 2, 2>) = <a, b, c, d> + <4, 6, 8, 10>
> > > ```
> > > 
> > > The label on the tin says that the first element of the RHS vector should be equal to `StartIdx`, which it clearly isn't. I haven't scrutinized the floating point codepath, but I assume it has a similar issue.
> > > 
> > > I believe we need to multiply the step vector by the step before adding the start value.
> > > 
> > > I feel like we should do something about this. Either assert that the step is 1 and add a fixme, or just fix the function.
> > I suppose another option is to document what the function actually does in the header
> Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we actually have tests that defend this behaviour. For example, see function `@non_primary_iv_loop_inv_trunc` in llvm/test/Transforms/LoopVectorize/induction-step.ll, which has CHECK lines like this:
> 
>   ; CHECK:         [[TMP3:%.*]] = trunc i64 %step to i32
>   ; CHECK-NEXT:    [[DOTSPLATINSERT5:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
>   ; CHECK-NEXT:    [[DOTSPLAT6:%.*]] = shufflevector <8 x i32> [[DOTSPLATINSERT5]], <8 x i32> poison, <8 x i32> zeroinitializer
>   ; CHECK-NEXT:    [[TMP4:%.*]] = mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[DOTSPLAT6]]
> 
> In this case I've tried to update the documentation to reflect this behaviour.
Yeah, I guess it checks out. Thanks for updating the doc string to be the actual computation.

start = 0, step = 2, VF = 4, Value = zeroinitializer
```
i                                     = <2, 4, 6, 8>
i_2 = <8, 8, 8, 8> + <2, 4, 6, 8>     = <10, 12, 14, 16>
i_3 = <8, 8, 8, 8> + <10, 12, 14, 16> = <18, 20, 22, 24>
```

start = 2, step = 2, VF = 4, Value = zeroinitializer
```
i                                     = <4, 6, 8, 10>
i_2 = <8, 8, 8, 8> + <4, 6, 8, 10>    = <12, 14, 16, 18>
i_3 = <8, 8, 8, 8> + <12, 14, 16, 18> = <20, 22, 24, 26>
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97861/new/

https://reviews.llvm.org/D97861