[PATCH] D115862: [LoongArch][test] (5/5) Add encoding and mnemonics tests

Lu Weining via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 20 23:29:07 PST 2021


SixWeining added a comment.

In D115862#3204063 <https://reviews.llvm.org/D115862#3204063>, @xen0n wrote:

> In D115862#3203997 <https://reviews.llvm.org/D115862#3203997>, @SixWeining wrote:
>
>> In D115862#3202374 <https://reviews.llvm.org/D115862#3202374>, @xen0n wrote:
>>
>>> These encoding format names are atrocious to look at and use, especially the "non-standard" ones not mentioned in the manuals yet still have to be named. The names mostly start with a number so cannot be used directly as identifiers, and the irregular formats are named after instructions themselves -- what if some instructions are added later that happen to share the same encoding, like renaming the format is okay?
>>>
>>> For fixing these problems once and for good, please consider adopting the unambiguous notation defined by the loongarch-opcodes project <https://github.com/loongson-community/loongarch-opcodes>. While you may or may not like the idea of renaming a few instruction mnemonics, the instruction format naming is fixed for you, and I really hope the official manuals could follow suit and just ditch the `xRIxx` names.
>>
>> @xen0n Thanks for your suggestion. We have read the notation defined in the repo you referenced. Yes, one name can exactly specifies one encoding. But there are a couple of issues left to adopt it in llvm. For example:
>>
>> - immediate operand
>>
>> Some immediate operands usually have [special] meanings like the `msbd` and `lsbd` of the `bstrpick.d` instruction.  This instruction will be named with `DJUk6Um6` under the [notation] you proposed and thus lost the meaning of `msbd` and `lsbd` because we have to use names like `imm1` and `imm2` in instruction definition.
>>
>> In current impl:
>>
>> [snip]
>>
>> [With] your proposal:
>>
>> [snip]
>>
>> - Understanding cost
>>
>> [In order] to understand what the name like `DJUk6Um6` means people may have to spend more time than the current impl.
>>
>> But we do not totally disagree this proposal and let's wait more reviewers' comments about that. Thanks.
>
> Hi,
>
> Thanks for your detailed reply; although I'd like to clarify a bit. Instruction formats are for describing fixed and variable bit patterns as recognized by the decoder hardware; they are very low-level details that should not be concerned with higher-level semantics (instead, opcodes and mnemonics do).
>
> Take the `bstr*` family for example: there is nothing stopping you from creating completely unrelated insns also taking two registers and two 6-bit immediates, and with LoongArch encoding conventions (slots placed from LSB to MSB mostly sequentially) this would almost definitely just be `DJUk6Um6`. With your current approach, the new insns would be of `BSTR_D` format, yet they don't belong to `bstr*` family at all.
>
> In addition to that, there's no reason to only "give meaning" to special formats alone; why not rename "2R" to "BINARY_ARITH", "2RI12" to "MEM" and "I26" to "LONG_BRANCH"? Again this is just exposing inconsistencies and double-standards in the current approach; some semantics just have to be looked up in the manuals and I believe this is acceptable for compiler maintainers.
>
> Plus, with the current naming scheme, names for all "regular" formats begin with a number, thus unusable as identifiers in most programming languages. This forces people to come up with ad-hoc names every time they have to implement LoongArch machine code in a project. Indeed, from what I see, the LoongArch port of binutils, llvm and qemu all use different names from each other, presumably because the Loongson people behind each port all come from different (sub)teams; this is not good for maintenance in the long run.

Different compiler communities use different names. This is the reality not only for LoongArch but also other targets.
Yes, I agree with you that binutils, llvm, qemu using same names is good for maintenace although this has not been achieved by other targets (like RISCV).
So, are you happy with the names defined for LoongArch in binutils?

In binutils-gdb/opcodes/loongarch-opc.c

  { 0x00c00000, 0xffc00000, "bstrpick.d",   "r0:5,r5:5,u16:6,u10:6",    0,          0,  0,  0 },

We may consider using `R0_5_R5_5_U16_6_U10_6` as the instruction format name for the `bstrpick.d` instruction.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115862/new/

https://reviews.llvm.org/D115862



More information about the llvm-commits mailing list