[llvm] [RISCV] Decompose LMUL > 1 reverses into LMUL * M1 vrgather.vv (PR #104574)

Fri Aug 16 10:35:36 PDT 2024

wangpc-pp wrote:

> > I was thinking that as well, I haven't fully looked into why the loop vectorizer doesn't just do that.
> 
> I don't think the vectorizer knows how to make a strided load. So it would have to be a masked.gather with vscale minus step vector?
> 
> I have code in my downstream for vp.load+vp.reverse -> vp.strided.load (might have come from BSC), but not for load+reverse.

We don't generate strided load/store in vectorizer, for example:
https://github.com/llvm/llvm-project/blob/fbef911dc3ed5ab2c857736de9e68bec4c578410/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp#L2026-L2049
But it can just simply generate vp.strided.load/vp.strided.store: 
https://llvm.org/docs/LangRef.html#llvm-experimental-vp-strided-load-intrinsic
The change won't be large, but it may be too specific.

https://github.com/llvm/llvm-project/pull/104574