[PATCH] D98250: [RISCV] Optimize INSERT_VECTOR_ELT sequences

Wed Mar 10 03:19:41 PST 2021

frasercrmck added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:2223
+    SDValue One = DAG.getConstant(1, DL, XLenVT);
+    ValLo = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i32, Val, Zero);
+    ValHi = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, MVT::i32, Val, One);
----------------
craig.topper wrote:
> Would it be simpler to just check that Val is a ConstantSDNode that fits in sign extended 32 bits and if it is just call getConstant with the lower 32 bits?
It would, thanks. I'll still extract ValLo and ValHi for the slow rv32/i64 case, but the lo/hi constant checking is unnecessarily complicated.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:2259
+    SDValue InsertI64VL = DAG.getConstant(2, DL, XLenVT);
+    // TODO: Should we be able to use UNDEF here? It appears to break the
+    // earlyclobber constraint. Just splat a zero value for now.
----------------
craig.topper wrote:
> I can believe that doesn't work. Undef sources that aren't tied don't seem to really allocate and just pick the first register.
> 
> I'm now a little worried someone might try that in the intrinsic interface.
Indeed. I went through some of the intrinsic tests to see if they were testing `undef` but didn't get any hits. I suspect we have problems there. I haven't worked with the builtin/intrinsic layers but presumably it'd be possible to do something like this in pseudo C:

```
nxvi32_t v;
v = rvv_vslide1up(v, a, m, vl);
```

and have that first operand lower to `undef`? I don't know how far we need to take this issue at this point, though I suspect it's not the last we've seen of it. Perhaps we'll just need to sanitize any problematic undef operands.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:2267
+    ValInVec = DAG.getNode(RISCVISD::VSLIDE1UP_VL, DL, I32ContainerVT, ValInVec,
+                           ValLo, I32Mask, InsertI64VL);
+  }
----------------
craig.topper wrote:
> Don't we need a bitcast here to get back to the ContainerVT to match the SDTypeProfile for the next operation?
Good catch, thanks. I've fixed that up.

On a related note, I noticed that it was perfectly content when using the original container mask, even though `SDTCisSameNumEltsAs` shouldn't have let (e.g.) nxv2i1 match against nxv4i1. The code for the TableGen constraint seems like it should be working, so maybe it's something in our patterns? I didn't have time to dig in more deeply.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98250/new/

https://reviews.llvm.org/D98250