[PATCH] D97895: [RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32.

Mon Mar 8 02:25:22 PST 2021

frasercrmck added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:2454
+    // point.
+    //   vmv.v.x vX, hi
+    //   vsll.vx vX, vX, /*32*/
----------------
craig.topper wrote:
> frasercrmck wrote:
> > craig.topper wrote:
> > > I was thinking maybe we just need two slide1ups using SEW=32 with VL set to 2 so that we don't slide anything but the scalars we're inserting.
> > Crafty; I like it. Doing that later along with INSERT_VECTOR_ELT would be my preferred way to go.
> I was also wondering if we could do something like this for the splat
> 
> ```
> vmv.v.x vX, hi // using SEW=64
> vsll.vx vX, vX, /*32*/ // clear the lower 32 bits like we're doing now.
> vsetvli e32 // same vl with half the lmul of the 64 bit type.
> vaddu.wx vX, vX, lo // should zero extend the lo value to 64 bits by zero extending. Since we cleared the lower 32 bits above this is equivalent to OR.
> vsetvli e64 // mask to original sew/lmul
> ```
> 
> The nice advantage it has is that it can be done in one physical register or physical register group. The current sequence requires two.
Sounds like that'd work, yep.

Off the top of my head, could you not also extend the INSERT_VECTOR_ELT sequence with a `vrgather.vi vd, vs2, 0` to splat the first element? Perhaps fewer instructions but you'd still that second register due to the non-overlap constraint. I'm not sure which is better: perhaps it's situational.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97895/new/

https://reviews.llvm.org/D97895