[PATCH] D115862: [LoongArch][test] (5/5) Add encoding and mnemonics tests

Mon Dec 20 21:33:26 PST 2021

xen0n added a comment.

In D115862#3203997 <https://reviews.llvm.org/D115862#3203997>, @SixWeining wrote:

> In D115862#3202374 <https://reviews.llvm.org/D115862#3202374>, @xen0n wrote:
>
>> These encoding format names are atrocious to look at and use, especially the "non-standard" ones not mentioned in the manuals yet still have to be named. The names mostly start with a number so cannot be used directly as identifiers, and the irregular formats are named after instructions themselves -- what if some instructions are added later that happen to share the same encoding, like renaming the format is okay?
>>
>> For fixing these problems once and for good, please consider adopting the unambiguous notation defined by the loongarch-opcodes project <https://github.com/loongson-community/loongarch-opcodes>. While you may or may not like the idea of renaming a few instruction mnemonics, the instruction format naming is fixed for you, and I really hope the official manuals could follow suit and just ditch the `xRIxx` names.
>
> @xen0n Thanks for your suggestion. We have read the notation defined in the repo you referenced. Yes, one name can exactly specifies one encoding. But there are a couple of issues left to adopt it in llvm. For example:
>
> - immediate operand
>
> Some immediate operands usually have [special] meanings like the `msbd` and `lsbd` of the `bstrpick.d` instruction.  This instruction will be named with `DJUk6Um6` under the [notation] you proposed and thus lost the meaning of `msbd` and `lsbd` because we have to use names like `imm1` and `imm2` in instruction definition.
>
> In current impl:
>
> [snip]
>
> [With] your proposal:
>
> [snip]
>
> - Understanding cost
>
> [In order] to understand what the name like `DJUk6Um6` means people may have to spend more time than the current impl.
>
> But we do not totally disagree this proposal and let's wait more reviewers' comments about that. Thanks.

Hi,

Thanks for your detailed reply; although I'd like to clarify a bit. Instruction formats are for describing fixed and variable bit patterns as recognized by the decoder hardware; they are very low-level details that should not be concerned with higher-level semantics (instead, opcodes and mnemonics do).

Take the `bstr*` family for example: there is nothing stopping you from creating completely unrelated insns also taking two registers and two 6-bit immediates, and with LoongArch encoding conventions (slots placed from LSB to MSB mostly sequentially) this would almost definitely just be `DJUk6Um6`. With your current approach, the new insns would be of `BSTR_D` format, yet they don't belong to `bstr*` family at all.

In addition to that, there's no reason to only "give meaning" to special formats alone; why not rename "2R" to "BINARY_ARITH", "2RI12" to "MEM" and "I26" to "LONG_BRANCH"? Again this is just exposing inconsistencies and double-standards in the current approach; some semantics just have to be looked up in the manuals and I believe this is acceptable for compiler maintainers.

Plus, with the current naming scheme, names for all "regular" formats begin with a number, thus unusable as identifiers in most programming languages. This forces people to come up with ad-hoc names every time they have to implement LoongArch machine code in a project. Indeed, from what I see, the LoongArch port of binutils, llvm and qemu all use different names from each other, presumably because the Loongson people behind each port all come from different (sub)teams; this is not good for maintenance in the long run.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115862/new/

https://reviews.llvm.org/D115862