[PATCH] D144002: [RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension

Thu Feb 16 11:27:52 PST 2023

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:2672
+    // The last operand must be constant 3 or 4 depending on the data width.
+    if (IsWordOp && Inst.getOperand(4).getImm() != 3) {
+      SMLoc Loc = Operands.back()->getStartLoc();
----------------
mtsamis wrote:
> craig.topper wrote:
> > mtsamis wrote:
> > > craig.topper wrote:
> > > > mtsamis wrote:
> > > > > craig.topper wrote:
> > > > > > mtsamis wrote:
> > > > > > > jrtc27 wrote:
> > > > > > > > philipp.tomsich wrote:
> > > > > > > > > mtsamis wrote:
> > > > > > > > > > jrtc27 wrote:
> > > > > > > > > > > mtsamis wrote:
> > > > > > > > > > > > philipp.tomsich wrote:
> > > > > > > > > > > > > mtsamis wrote:
> > > > > > > > > > > > > > craig.topper wrote:
> > > > > > > > > > > > > > > craig.topper wrote:
> > > > > > > > > > > > > > > > Why does this 3rd operand exist at all?
> > > > > > > > > > > > > > > Oops, I meant fifth operand.
> > > > > > > > > > > > > > Although I also don't see why this required, I implemented it as per the specification
> > > > > > > > > > > > > The problem really is the Specification on this one:
> > > > > > > > > > > > >    th.ldd rd1,rd2,(rs1),#imm2,4
> > > > > > > > > > > > > appears to have 4 as an operand (even though it is the only permissible value and does not get encoded explicitly).
> > > > > > > > > > > > > 
> > > > > > > > > > > > > However, looking at it from another angle the ",4" is part of the operation mnemonic (or an %_ term when expressing the pattern in GNU notation):
> > > > > > > > > > > > >    th.ldd %1,%2,(%0),%3,4
> > > > > > > > > > > > > Given that the 4 is not encoded, I am leaning with Craig at the moment and consider it as part of the mnemonic (i.e., a split off fragment of 'th.ldd'+',4').
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I would much rather have seen this specified as
> > > > > > > > > > > > >     th.ldd rd1,rd2,#offset(rs1)
> > > > > > > > > > > > > where offset would have a constraint to have its lower 4 bits cleared.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > This would be a much more meaningful assembler syntax.
> > > > > > > > > > > > > Can we have this as a pseudo (that also becomes the preferred disassembly)?
> > > > > > > > > > > > > I'll volunteer to send out a PR against the T-Head specification to add that additional assembler-pseudo under the th.ldd definition…
> > > > > > > > > > > > From what I understand, even if we have an additional assembler-pseudo for a nicer syntax we would still have to support the current one from the specification. In that case, adding the 'nice' syntax won't make the implementation better because we'll have to support the other one as well. The "4" is indeed not encoded and could be part of the assembler string as + ", 4" but from what I understood when implementing it, this doesn't work great with the current infrastructure. You can also check this discussion about implicit operands from which I got ideas how to properly implent them: [[ https://discourse.llvm.org/t/mc-layer-parsing-disassembling-implicit-operands/49816 | mc-layer-parsing-disassembling-implicit-operands ]]. The constant assembler string makes the assembler not able to properly parse the instruction and also makes it hard to get proper error messages as in this implementation.
> > > > > > > > > > > > 
> > > > > > > > > > > > Tl;dr My personal conclusion is that if we have to support the current syntax then we don't benefit from creating an alternative nicer syntax.
> > > > > > > > > > > > 
> > > > > > > > > > > `lw rd, rs1, imm` is the syntax in the unprivileged spec, but `lw rd, imm(rs1)` is the canonical assembly format. Ditto for `jalr`. You could just view it the same way.
> > > > > > > > > > But the mempair instruction are of the form `th.ldd rd1, rd2, (rs1), imm2, 4` where there is both `imm` but also a trailing `3` or `4` which has no function (as discussed above) and is not included in the instruction encoding. But it still has to be there as per the specification and the assembler/disassembler must work with it.
> > > > > > > > > > 
> > > > > > > > > > So I don't think this is the same as `lw`?
> > > > > > > > > I agree with Jessica in principle, but we have a released GNU Binutils in the wild that implements the "weird" assembler syntax.  So we should not have a hard break for backwards compatibility (i.e., will need coexistence).
> > > > > > > > I wasn't suggesting ditching the weird syntax; `lw rd, rs1, imm` is legal assembly and works in both assemblers today. Just saying we can (a) have multiple syntaxes supported (b) change the canonical one.
> > > > > > > Jessica, I think I now better understand what you've meant and it sounds like a great way to do this.
> > > > > > > 
> > > > > > > Then, would it be possible for the alias to be "th.ldd $rd1, $rd2, ($rs1), $offset, 4" where the "4" part is discarded?
> > > > > > > I'm not familiar enough with the infrastructure to know if this will work fine with the assembler/disassembler, but I can try it out.
> > > > > > > 
> > > > > > > Please correct me if I'm wrong or understood your proposal incorrectly.
> > > > > > I don't think `InstAlias` or the Instruction AsmString support a specific constant hardcoded into the string. So it would still need to be parsed as an immediate.
> > > > > Ok, I see. Then I can add the alias for the alternative syntax if we'd like to, without that affecting the current implementation.
> > > > > 
> > > > > 
> > > > Is the 3 or 4 indicating the amount the immediate is shifted by? Instead of only accepting immediates that are divisible 8 or 16. The latter is what C.LW does for example.
> > > > Is the 3 or 4 indicating the amount the immediate is shifted by?
> > > 
> > > Yes, that looks to be the case.
> > > 
> > > > Instead of only accepting immediates that are divisible 8 or 16. The latter is what C.LW does for example.
> > > 
> > > I looked up the format of C.LW and more conveniently the (shifted) constant is a single argument of the instruction with the restriction you mentioned. In MemPair the two significant bits of the constant and the shift amount are separate arguments but the shift is reduntant anyway.
> > I don't know if this instruction was defined before C.LW or just ignored the precedent.
> > 
> > But I guess it doesn't matter now. We have to implement the spec.
> Yes, I agree with that; the format for these instructions is a bit unfortunate.
> 
> So, of all things that were discussed here, is there some desired change I should do on the current implementation?
I don't think we can change the canonical syntax without binutils also supporting or we will break -fno-integrated-as.

I'm ok with just accepting the syntax binutils supports.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144002/new/

https://reviews.llvm.org/D144002