[PATCH] D23561: [RISCV 4/10] Add basic RISCV{InstrFormats, InstrInfo, RegisterInfo, }.td

Mon Aug 22 02:57:28 PDT 2016

asb added inline comments.

================
Comment at: lib/Target/RISCV/RISCVInstrFormats.td:97
@@ +96,3 @@
+{
+  bits<13> imm13;
+  bits<5> rs1;
----------------
asb wrote:
> t.p.northover wrote:
> > This immediate doesn't appear to have a bit 0 in the spec (it fits into exactly the same fields as imm12 does).
> > 
> > So what we're really deciding here is what the immediate in the MCInst representation of "BEQ #x" should be. Valid answers are "x" and "the bits representing how x is encoded", with newer targets tending to favour the latter.
> > 
> > This is mostly because it makes it harder by design to construct an invalid MCInst when you're doing it manually. Other than that, it's just punting complexity around the various MC layer components:
> >   
> > | MC component | with "x" | with bits |
> > |--|--|--|
> > | AsmParser | nop | encodes |
> > | Disassembler | decodes | nop |
> > | AsmPrinter | nop | decodes |
> > | Fixup handling | encodes | nop |
> > 
> > Either way "imm13" is probably misleading as a name and it's probably a good idea to be consistent between formats if possible (I note FU has taken the second approach).
> I am incredibly glad you raised this issue, as it's something I've agonised back and forth over. I considered writing something up about this in the patch description, but thought it may be of limited interest. The same issue you raise applies to FUJ vs FU.
> 
> I started going in the direction of `imm13` because:
> 1) this better matches the spec which describes immediates in terms of slicing the integer logically represented 
> 2) the idea that the MCInst representation of 'BEQ #x` should be 'x' appealed to me.
> 
> The logical value is a 13-bit immediate with the restriction that the LSB must be 0, but of course this is encoded in 12 bits. Arguably imm13m0 would be more descriptive (imm13, masked bit0) but this description is perhaps a little opaque to most. As you've spotted, I ended up going a different direction with FU. After deciding `imm32` was misleading, I tried out with `imm20shl12` as a description of the value that's logically a 20-bit immediate shifted left 12 places. However I think I found that for whatever reason the code manipulating this didn't read cleanly. 
> 
> The argument that now FU has its arguments defined in terms of the bit encoding, FSB and FUJ are inconsistent with it is persuasive. I'll modify so FSB takes an imm12 and FUJ takes an imm20.
Do you have any thoughts on a clearer naming convention? Suppose I change FUJ so it takes an imm20 (i.e. the bits to be encoded), there are a few choices for naming the operand (shown below in context:
1. `def JAL : FUJ<0b1101111, (outs GPR:$rd), (ins simm21_mask1:$imm20)`. The argument for this would be that `simm21_mask1` describes the logical value rather than the physical encoding and we don't have to describe the value in terms of either encoding or decoding. As part of `def simm21_mask1 : Operand<i32>` we would add a `DecoderMethod` and `EncoderMethod` that converts to/from the imm20 bit encoding
2. `def JAL : FUJ<0b1101111, (outs GPR:$rd), (ins simm20_lsl1:$imm20)`. This matches what the Mips backend has done. The name reflects how to go from the encoded 20-bit representation to the logical value that represents.
3.  `def JAL : FUJ<0b1101111, (outs GPR:$rd), (ins simm21_asr1:$imm20)`. The name reflects how to go from the logical value (the signed 21-bit offset) to the 20-bit encoding.

It's not clear to me why it would be more logical to use option 2) vs option 3), which is what tempts me to stick with something along the lines of the current name. `simm21_mask1` describes the logical value (a signed 21-bit value with the LSB guaranteed to be zero, i.e. has been masked `& 1`.

https://reviews.llvm.org/D23561