[PATCH] D97895: [RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32.

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 9 09:42:33 PST 2021


craig.topper added inline comments.


================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:2454
+    // point.
+    //   vmv.v.x vX, hi
+    //   vsll.vx vX, vX, /*32*/
----------------
frasercrmck wrote:
> craig.topper wrote:
> > frasercrmck wrote:
> > > craig.topper wrote:
> > > > frasercrmck wrote:
> > > > > craig.topper wrote:
> > > > > > I was thinking maybe we just need two slide1ups using SEW=32 with VL set to 2 so that we don't slide anything but the scalars we're inserting.
> > > > > Crafty; I like it. Doing that later along with INSERT_VECTOR_ELT would be my preferred way to go.
> > > > I was also wondering if we could do something like this for the splat
> > > > 
> > > > ```
> > > > vmv.v.x vX, hi // using SEW=64
> > > > vsll.vx vX, vX, /*32*/ // clear the lower 32 bits like we're doing now.
> > > > vsetvli e32 // same vl with half the lmul of the 64 bit type.
> > > > vaddu.wx vX, vX, lo // should zero extend the lo value to 64 bits by zero extending. Since we cleared the lower 32 bits above this is equivalent to OR.
> > > > vsetvli e64 // mask to original sew/lmul
> > > > ```
> > > > 
> > > > The nice advantage it has is that it can be done in one physical register or physical register group. The current sequence requires two.
> > > Sounds like that'd work, yep.
> > > 
> > > Off the top of my head, could you not also extend the INSERT_VECTOR_ELT sequence with a `vrgather.vi vd, vs2, 0` to splat the first element? Perhaps fewer instructions but you'd still that second register due to the non-overlap constraint. I'm not sure which is better: perhaps it's situational.
> > > Off the top of my head, could you not also extend the INSERT_VECTOR_ELT sequence with a `vrgather.vi vd, vs2, 0` to splat the first element? Perhaps fewer instructions but you'd still that second register due to the non-overlap constraint. I'm not sure which is better: perhaps it's situational.
> > 
> > I'm not following. INSERT_ELEMENT has a scalar register input input. How did it get to element 0 for `vrgather.vi vd, vs2, 0`?
> > 
> Ah sorry I might have skipped a step. I was proposing extending the two slide1up sequence you proposed earlier to get it from element zero to all elements.
Ok I get it now. When I wrote my slide1up suggestion I was only thinking about this case where element 0 was the final location.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97895/new/

https://reviews.llvm.org/D97895



More information about the llvm-commits mailing list