[PATCH] D107790: [RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter.

Wed Aug 18 04:52:48 PDT 2021

rogfer01 added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp:238
+  // Make sure we have a splat.
+  Value *SplatOp = getSplatValue(OtherOp);
+  if (!SplatOp)
----------------
rogfer01 wrote:
> One interesting difference between fixed and scalable is that fixed vectors embed a iota vector as a constant in a vector as the loop header incoming value.
> 
> Like this:
> 
> ```lang=llvm
> vector.body:
>   %index = phi i64 [ 0, %entry ], [ %index.next, %vector.body ]                 
>   %vec.ind = phi <32 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15, i64 16, i64 17, i64 18, i64 19, i64 20, i64 21, i64 22, i64 23, i64 24, i64 25, i64 26, i64 27, i64 28, i64 29, i64 30, i64 31>, %entry ], [ %vec.ind.next, %vector.body ]
>   %0 = mul nuw nsw <32 x i64> %vec.ind, <i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5, i64 5>
>   %1 = getelementptr inbounds i8, i8* %B, <32 x i64> %0
> ```
> 
> However with scalable vectorisation (see https://www.godbolt.org/z/Gchx863os ) the vector phi is gone and the iota vector (stepvector) coming from the header is used to compute the vector of indices.
> 
> ```lang=llvm
> vector.ph:                                        ; preds = %entry
>   %4 = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64(), !dbg !22
>   ...
>   br label %vector.body, !dbg !24
> vector.body:                                      ; preds = %vector.body, %vector.ph
>   %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ], !dbg !25
>   %.splatinsert11 = insertelement <vscale x 2 x i64> poison, i64 %index, i32 0, !dbg !24
>   %.splat12 = shufflevector <vscale x 2 x i64> %.splatinsert11, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer, !dbg !24
>   %7 = add <vscale x 2 x i64> %.splat12, %4, !dbg !24
>   %8 = mul nuw nsw <vscale x 2 x i64> %7, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 5, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), !dbg !27
>   %9 = getelementptr inbounds i8, i8* %B, <vscale x 2 x i64> %8, !dbg !27
> ```
> 
> So at this point the algorithm needs to diverge a bit because now the `phi` won't be the base case (there won't be a vector `phi`). Instead I understand we need to determine we're splatting a scalar recurrence and combining it with a `stepvector`.
> 
> Not that we have to address it now. We may have to bear it in mind in the future if we plan to extend this to scalable vectors.
On a second thought, it may happen that stepvector gets optimised in a way that the vector phi is used (similar to the fixed case) so the difference goes away (being able to carry the vector of indices through the loop seems better than synthesising it fully in every iteration).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107790/new/

https://reviews.llvm.org/D107790