[PATCH] D107790: [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.

Wed Aug 18 10:17:39 PDT 2021

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp:221
+  Value *OtherOp;
+  if (isa<Instruction>(BO->getOperand(0)) &&
+      L->contains(cast<Instruction>(BO->getOperand(0)))) {
----------------
rogfer01 wrote:
> I'm curious, you require one of the values to be an instruction but I fail to see where you need that.
> 
> Other places where a (specific) instruction is needed seem correctly guarded via `dyn_cast`.
I took the idea here from ARMMVEGatherScatterOpt's code to push invariant code out of the loop.

If it one operand isn't an instruction then it would have to be something loop invariant which means the whole operation is invariant. In that case we would never reach a loop phi.

================
Comment at: llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp:238
+  // Make sure we have a splat.
+  Value *SplatOp = getSplatValue(OtherOp);
+  if (!SplatOp)
----------------
rogfer01 wrote:
> rogfer01 wrote:
> > One interesting difference between fixed and scalable is that fixed vectors embed a iota vector as a constant in a vector as the loop header incoming value.
> > 
> > Like this:
> > 
> > ```lang=llvm
> > vector.body:
> >   %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]                 
> >   %vec.ind = phi <32 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18, i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25, i64 26, i64 27, i64 28, i64 29, i64 30, i64 31>, %entry ], [ %vec.ind.next, %vector.body ]
> >   %0 = mul nuw nsw <32 x i64> %vec.ind, <i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5>
> >   %1 = getelementptr inbounds i8, i8* %B, <32 x i64> %0
> > ```
> > 
> > However with scalable vectorisation (see https://www.godbolt.org/z/Gchx863os ) the vector phi is gone and the iota vector (stepvector) coming from the header is used to compute the vector of indices.
> > 
> > ```lang=llvm
> > vector.ph:                                        ; preds = %entry
> >   %4 = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64(), !dbg !22
> >   ...
> >   br label %vector.body, !dbg !24
> > vector.body:                                      ; preds = %vector.body, %vector.ph
> >   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ], !dbg !25
> >   %.splatinsert11 = insertelement <vscale x 2 x i64> poison, i64 %index, i32 0, !dbg !24
> >   %.splat12 = shufflevector <vscale x 2 x i64> %.splatinsert11, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer, !dbg !24
> >   %7 = add <vscale x 2 x i64> %.splat12, %4, !dbg !24
> >   %8 = mul nuw nsw <vscale x 2 x i64> %7, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 5, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), !dbg !27
> >   %9 = getelementptr inbounds i8, i8* %B, <vscale x 2 x i64> %8, !dbg !27
> > ```
> > 
> > So at this point the algorithm needs to diverge a bit because now the `phi` won't be the base case (there won't be a vector `phi`). Instead I understand we need to determine we're splatting a scalar recurrence and combining it with a `stepvector`.
> > 
> > Not that we have to address it now. We may have to bear it in mind in the future if we plan to extend this to scalable vectors.
> On a second thought, it may happen that stepvector gets optimised in a way that the vector phi is used (similar to the fixed case) so the difference goes away (being able to carry the vector of indices through the loop seems better than synthesising it fully in every iteration).
That's interesting that the form is different. Is that because the induction variable step would need to be a splat of vscale * fixed element count that would also need to be created?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107790/new/

https://reviews.llvm.org/D107790