[llvm] [TableGen][DecoderEmitter] Add option to emit type-specialized `decodeToMCInst` (PR #146593)

Thu Jul 3 20:31:14 PDT 2025

jurahul wrote:

I have spent more time on this and refined the change a bit. Essentially, two things. 

(1) I made the interface a little more structured:

```
// InstrDecoderOption - This class is used to provide some options to the
// TableGen DecoderEmitter backend.
class InstrDecoderOption<string ty, list<int> bws> {
  string CPPType = ty;       // C++ type for generating non-templated code.
  list<int> Bitwidths = bws; // List of bitwidths supported by the above type.

  assert !not(!empty(CPPType)), "CPP type cannot be empty";
  assert !not(!empty(Bitwidths)), "Bitwidths cannot be empty";
}

class InstrInfo {
...
  list<InstrDecoderOption> DecoderOptions = [];
}

def AMDGPUInstrInfo : InstrInfo {
  let guessInstructionProperties = 1;

  // Opt-in into non-templated code for instruction decoder.
  let DecoderOptions = [
    InstrDecoderOption<"uint32_t", [32]>,
    InstrDecoderOption<"uint64_t", [64]>,
    InstrDecoderOption<"DecoderUInt128", [96, 128]>,
  ];
}
```
and 

(2) when this option is provided, we do not generate *any* templated code. So, all functions in the emitted code are non-templated, including `decodeInstruction`. @topperc this is similar to what you suggested, except without the 16/32 suffix and these are just overloaded functions. To avoid coincidental errors, I made the `insn` parameter a reference so that we don't inadvertently call the wrong one. I then measured the size of the AMDGPU disassembler code by using `size -A build/lib/libLLVMAMDGPUDisassembler.so` command. With that I see the following delta:

```
.text 378780 -> 244684, i.e., a 35% reduction in size
.rodata 352648 -> 334960 i.e., a 5% reduction in size
```
The reduction in .text size is expected. The reduction in data size is coincidental. Because the decoder index is now assigned per-CPP type, its value is generally smaller for the 2nd and 3rd CPP type, so the resulting ULEB128 encoding is smaller as well, resulting in a net reduction in the decoder table size.

Please let me know how this looks. I still need to add unit tests for this. Once this is in, I think there are 2 additional improvements here that can be made:

a. Improve the robustness. Encode the bitwidth as the first entry in the decoder table, and in the appropriate overload of `decodeInstruction`, fail immediately if it's not one of the supported ones for that variant.
b. If its beneficial, change `NumToSkip` encoding per CPP type. Because the code is now specialized per type, we can support the case where say all 32-bit tables are small enough to use 8-bits but only 64-bit tables need 16 bits. I am not sure if this will pan out but seems this per-size/CPP type specialization can open this possibility.

https://github.com/llvm/llvm-project/pull/146593