[llvm] [TableGen][DecoderEmitter] Add option to emit type-specialized `decodeToMCInst` (PR #146593)

Thu Jul 3 21:18:57 PDT 2025

topperc wrote:

> I have spent more time on this and refined the change a bit. Essentially, two things.
> 
> (1) I made the interface a little more structured:
> 
> ```
> // InstrDecoderOption - This class is used to provide some options to the
> // TableGen DecoderEmitter backend.
> class InstrDecoderOption<string ty, list<int> bws> {
>   string CPPType = ty;       // C++ type for generating non-templated code.
>   list<int> Bitwidths = bws; // List of bitwidths supported by the above type.
> 
>   assert !not(!empty(CPPType)), "CPP type cannot be empty";
>   assert !not(!empty(Bitwidths)), "Bitwidths cannot be empty";
> }
> 
> class InstrInfo {
> ...
>   list<InstrDecoderOption> DecoderOptions = [];
> }
> 
> def AMDGPUInstrInfo : InstrInfo {
>   let guessInstructionProperties = 1;
> 
>   // Opt-in into non-templated code for instruction decoder.
>   let DecoderOptions = [
>     InstrDecoderOption<"uint32_t", [32]>,
>     InstrDecoderOption<"uint64_t", [64]>,
>     InstrDecoderOption<"DecoderUInt128", [96, 128]>,
>   ];
> }
> ```
> 
> and
> 
> (2) when this option is provided, we do not generate _any_ templated code. So, all functions in the emitted code are non-templated, including `decodeInstruction`. @topperc this is similar to what you suggested, except without the 16/32 suffix and these are just overloaded functions. To avoid coincidental errors, I made the `insn` parameter a reference so that we don't inadvertently call the wrong one. I then measured the size of the AMDGPU disassembler code by using `size -A build/lib/libLLVMAMDGPUDisassembler.so` command. With that I see the following delta:
> 
> ```
> .text 378780 -> 244684, i.e., a 35% reduction in size
> .rodata 352648 -> 334960 i.e., a 5% reduction in size
> ```
> 
> The reduction in .text size is expected. The reduction in data size is coincidental. Because the decoder index is now assigned per-CPP type, its value is generally smaller for the 2nd and 3rd CPP type, so the resulting ULEB128 encoding is smaller as well, resulting in a net reduction in the decoder table size.
> 
> Please let me know how this looks. I still need to add unit tests for this. Once this is in, I think there are 2 additional improvements here that can be made:
> 
> a. Improve the robustness. Encode the bitwidth as the first entry in the decoder table, and in the appropriate overload of `decodeInstruction`, fail immediately if it's not one of the supported ones for that variant. b. If its beneficial, change `NumToSkip` encoding per CPP type. Because the code is now specialized per type, we can support the case where say all 32-bit tables are small enough to use 8-bits but only 64-bit tables need 16 bits. I am not sure if this will pan out but seems this per-size/CPP type specialization can open this possibility.

The .rodata reduction is cool. I suspected that might happen.

I kind of don't like that we have to describe the cpp types to tablegen at all. It feels like something we should be able to infer from the instruction widths. Needing to go look elsewhere to describe the types feels like its going to confuse someone when they want to add a new bit width.

@s-barannikov what do you think?

https://github.com/llvm/llvm-project/pull/146593