[PATCH] D97861: [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 5 04:50:05 PST 2021
david-arm marked 2 inline comments as done.
david-arm added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2265
+
+ // Add on StartIdx
+ Value *StartIdxSplat = Builder.CreateVectorSplat(
----------------
ctetreau wrote:
> ctetreau wrote:
> > It seems like this does the same thing as the original version. However, I don't think this function does what it claims to do.
> >
> > The docs say that this computes `Val + <StartIdx, StartIdx + Step, StartIdx + 2*Step, ...>`
> >
> > But what this function does is:
> > ```
> > StartIdxSplat = <StartIdx, StartIdx, ...>
> > InitVec = StartIdxSplat + <0, 1, ...>
> > Step = <InputStep, InputStep, ...>
> > result = Val + (InitVec * Step)
> > ```
> >
> > If the input step is 1, this seems to do the right thing:
> >
> > ```
> > StartIdx = 2 // some non-1 start for illustrative purposes
> > Val = <a, b, c, d>
> > InputStep = 1
> >
> > StartIdxSplat = <2, 2, 2, 2>
> > InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
> > Step = <1, 1, 1, 1>
> > result = <a, b, c, d> + (<2, 3, 4, 5> * <1, 1, 1, 1>) = <a, b, c, d> + <2, 3, 4, 5>
> > ```
> >
> > But if we try a larger input step:
> >
> > ```
> > StartIdx = 2 // some non-1 start for illustrative purposes
> > Val = <a, b, c, d>
> > InputStep = 2
> >
> > StartIdxSplat = <2, 2, 2, 2>
> > InitVec = <2, 2, 2, 2> + <0, 1, 2, 3> = <2, 3, 4, 5>
> > Step = <2, 2, 2, 2>
> > result = <a, b, c, d> + (<2, 3, 4, 5> * <2, 2, 2, 2>) = <a, b, c, d> + <4, 6, 8, 10>
> > ```
> >
> > The label on the tin says that the first element of the RHS vector should be equal to `StartIdx`, which it clearly isn't. I haven't scrutinized the floating point codepath, but I assume it has a similar issue.
> >
> > I believe we need to multiply the step vector by the step before adding the start value.
> >
> > I feel like we should do something about this. Either assert that the step is 1 and add a fixme, or just fix the function.
> I suppose another option is to document what the function actually does in the header
Hi @ctetreau, I had a look and I believe the function to be doing the right thing and we actually have tests that defend this behaviour. For example, see function `@non_primary_iv_loop_inv_trunc` in llvm/test/Transforms/LoopVectorize/induction-step.ll, which has CHECK lines like this:
; CHECK: [[TMP3:%.*]] = trunc i64 %step to i32
; CHECK-NEXT: [[DOTSPLATINSERT5:%.*]] = insertelement <8 x i32> poison, i32 [[TMP3]], i32 0
; CHECK-NEXT: [[DOTSPLAT6:%.*]] = shufflevector <8 x i32> [[DOTSPLATINSERT5]], <8 x i32> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[DOTSPLAT6]]
In this case I've tried to update the documentation to reflect this behaviour.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D97861/new/
https://reviews.llvm.org/D97861
More information about the llvm-commits
mailing list