[PATCH] D79870: [RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbb asm instructions

Thu Jun 25 05:15:10 PDT 2020

PaoloS added a comment.

Sorry for the late answer.
I'm catching up with this now.

I agree on the reorganization of the tests. I'm fixing that.
I notice that the tests of the 64 bit instructions on 32 bit are quite noisy (above all for clz, ctz and pcnt). I'll soon upload a revision so that you can all see.

The immediate shifts with ones instead (sloi, sroi) have the problem that LLVM optimizes them so that instead of having DAG nodes resembling the straightforward operation:

(sloi)

~(~x << shamt)

it prefers to use a mask:

(x << shamt) | (~(-1 << shamt))

That means that in the DAG pattern there's a constant (the right operand of the 'or') with a value that depends on the value of the shamt and that the resulting pattern (sloi/sroi) won't use.
Of course we could just drop it since it isn't used, like this:

  def : Pat<(or (shl GPR:$rs1, simm12:$shamt), mask),
          (SLOI GPR:$rs1, simm12:$shamt)>;

But that introduces an ambiguity.
In order to check that the operand is actually the mask derived from the shamt we need to check that it is related to that.
I'm trying now to see if a ComplexPattern can do the trick and select sloi and sroi for me while checking that the mask is correct.
I'm not sure though if it is a neat enough solution for upstream.

About slliu.w instead the issue is quite different. As the documentation says slliu.w is identical to slli apart from the fact that it zeroes the (xlen-1):31 bits before shifting.
LLVM though optimizes out such casting before getting to the instruction selection. In that way it is not possibile to distinguish it from a normal slli. Considering that the result is a single instruction (slli) in any case I think it's just better to leave it like that and let the user use the slliu.w instruction directly if needed.
A similar thing happens with ctzw.
While for clzw and pcntw LLVM doesn't optimize out the truncation as not doing it could actually affect the result, for ctz it doesn't care. I guess that since it's checking the tail zeroes until up bit 31, once it sees that the lower 32 bits of the number are not 0 it processes the original 64 bit value normally.
Otherwise it returns 32.
That makes it unpractical to tell it apart from a rv64 ctz.
The outcome is that for llvm.cttz.i32 on rv64 instead of getting ctzw I get anyway ctz.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79870/new/

https://reviews.llvm.org/D79870