[PATCH] D91259: [RISCV] Lower GREVI and GORCI as custom nodes

Fri Nov 13 09:54:56 PST 2020

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:1138
+                               const RISCVSubtarget &Subtarget) {
+  if (Op.getSimpleValueType() == Subtarget.getXLenVT()) {
+    auto LHS = matchRISCVBitmanipPat(Op.getOperand(0), IsW);
----------------
frasercrmck wrote:
> frasercrmck wrote:
> > craig.topper wrote:
> > > Could we match i32 pre-type legalization on RV64 and type legalize i32 RISCVISD::GREVI to GREVIW? I think that would remove the need to match from SIGN_EXTEND_INREG.
> > Good idea; I'll give it a go.
> This idea mostly works but is failing for two RV64 test cases: i32 `fshl` and `fshr` intrinsics with constant 16.
> 
> `fshl` is lowered to sign-extended i32 `rotl` instructions and dag-combined to a `(sext_inreg (i64 (or (shl x, 16), (srl (and x, 0xffff0000), 16))))` pattern, skipping the 32-bit or combine. That's something the sext combine was getting but not something the or combine is. Similar for `fshr`. I think we need the sext in there to be able to match this as a greviw.
> 
> Combining certain `rotl` (and `rotr`, `bswap` and `bitreverse`) to grevi would catch this and would unify everything under one node (something I was mulling about anyway as it'd provide more opportunity for grev combining) but it might also be heavy-handed.
> This idea mostly works but is failing for two RV64 test cases: i32 `fshl` and `fshr` intrinsics with constant 16.
> 
> `fshl` is lowered to sign-extended i32 `rotl` instructions and dag-combined to a `(sext_inreg (i64 (or (shl x, 16), (srl (and x, 0xffff0000), 16))))` pattern, skipping the 32-bit or combine. That's something the sext combine was getting but not something the or combine is. Similar for `fshr`. I think we need the sext in there to be able to match this as a greviw.
> 
> Combining certain `rotl` (and `rotr`, `bswap` and `bitreverse`) to grevi would catch this and would unify everything under one node (something I was mulling about anyway as it'd provide more opportunity for grev combining) but it might also be heavy-handed.

I'm not sure that we should be using GREVI for cases that can be handled with ROLW/RORW. I could see a CPU having a better performing implementation of rotate than grevi.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91259/new/

https://reviews.llvm.org/D91259