[llvm] [TableGen] Add support for DefaultMode in per-HwMode encode/decode. (PR #83029)

Mon Feb 26 13:06:24 PST 2024

nvjle wrote:

> This comment isn't exactly related to this patch, but, I'm wondering what you are intending to use this feature for, and what the long-term strategy for EncodingByHwMode is.

We are already using this feature (that is, EncodingByHwMode) and have been for quite some time in a production downstream backend. This backend targets a very complicated, long-lived, evolving ISA and has encodings that can (and already has, and will) change from hardware generation-to-generation/version-to-version according to the needs of the computer architects in our organization. One could literally have an instruction set today with encoding A, and a year later (or less),  have largely the same instruction set but with a vastly different encoding. Hardware efficiency concerns (speed, power, etc) are a big reason for this.

For most existing common ISAs, it is almost unthinkable that the encodings would substantially change beyond a handful of instructions-- and the encodings are basically forever. But for some machines (e.g., the ones I am targeting) this would overly constrain and handcuff the architects when trying to meet extreme performance goals. 

I believe we're using this feature as it was intended. Maybe one of way of thinking about it as a way to make encodings independently configurable for a given opcode. One could, say, bake-in a prefix or postfix for every such opcode (e.g., MYINSN_MODEA, MYINSN_MODEB, ...) for every mode. That is clearly terrible-- at least when one has many such instructions. 
For example, there are many possible passes in our backend (let's just say peephole-like optimizers for discussion's sake) that refer to these opcodes. That code ideally just references one instruction MYINSN rather than having to consider all variants of what are really just one instruction-- and all of that code must be updated if the encodings change in the future. It is less error-prone, reduces code duplication and maintenance.

Although we did not introduce EncodingByHwMode, we certainly find it to be an extremely useful feature that would cause us grief if it were gone-- and that would definitely make an already complicated backend even more so. We're not the only downstream backend using it.

>From that point of view, the long-term strategy is to continue using this wonderful feature-- and time permitting-- contribute any fixes or improvements upstream. Some of these fixes have been sitting locally in at least one downstream backend. I personally am trying to reduce the impedance mismatch of "living downstream" so our merges go smoother. That-- and I think the whole community benefits. In my personal case, I've used this feature in two different backends for two different organizations-- making some of the same fixes twice.

Further, it is my plan/hope that the backend discussed above is eventually upstreamed so that LLVM will have an in-tree target using the feature. That timing of that goes all the way up to near deity-levels in the organization, so in the meantime, I hope to continue to push enhancements we make upstream (including unit tests to catch bitrot).

> 
> I had actually been considering proposing to _remove_ it entirely, because it seems pretty incompatible with how tablegen instruction specification normally works.
> 
> In particular, an instruction encoding is generally shared between instructions which use a constant value for a given field, and those which use it as an operand. E.g. take RISCV's "RVInstR", which defines rs1, rs2, and rd fields. In different subtypes, it might be populated via an instruction operand, or e.g. via "let rd = 0".
> 
> But with the EncodingByHwMode, the constant "let" stops working, because the encoding's named fields are no longer part of the tablegen record for the instruction.
> 
> Any thoughts?

https://github.com/llvm/llvm-project/pull/83029