[llvm] [TableGen][DecoderEmitter] Add option to emit type-specialized `decodeToMCInst` (PR #146593)

Tue Jul 1 16:58:50 PDT 2025

jurahul wrote:

> > Repeating the type per-instruction record might be redundant (and we would need more verification as well to verify for a given size, all insts of that size have the C++ type specified and its consistent). One option is to add a new InstructionTypeAndSize class that records this information, and DecoderEmitter can use that if its present else fall back to templated code. Something like
> > ```
> > class InstructionDecoderTypeAndSize<string CPPType, list<int> Bitwidths> {
> > }
> > 
> > class InstructionDecoderTypeAndSizes<list<InstructionDecoderTypeAndSize>> {
> > }
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > and a particular backend can define a single record of type InstructionDecoderTypeAndSizes<> which the DecoderEmitter will use. This is essentially encoding the command line option as a record.
> > ```
> > // RISCV.td
> > // Opt-in to non-templated deocder code.
> > def : InstructionDecoderTypeAndSizes<[
> >                 InstructionDecoderTypeAndSize<"uint64_t", [48]>,
> >                 InstructionDecoderTypeAndSize<"uint32_t", [16,32]>]>;
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > or more simply
> > ```
> > class InstructionDecoderTypeAndSizes<list<string> CPPTypes, list<list<int>> Bitwidths> {
> > }
> > 
> > def : InstructionDecoderTypeAndSizes<
> >            [ "uint32_t", uint64_t"],
> >            [ [16,32],    [64]     ]>;
> > ```
> 
> RISCV uses a common base class for each of the 3 instruction sizes. Other targets may be similar.
> 
> ```
> class RVInst<dag outs, dag ins, string opcodestr, string argstr,                 
>              list<dag> pattern, InstFormat format>                               
>     : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {              
>   field bits<32> Inst;                                                           
>   // SoftFail is a field the disassembler can use to provide a way for           
>   // instructions to not match without killing the whole decode process. It is   
>   // mainly used for ARM, but Tablegen expects this field to exist or it fails   
>   // to build the decode table.                                                  
>   field bits<32> SoftFail = 0;                                                   
>   let Size = 4;                                                                  
> }                                                                                
>                                                                                  
> class RVInst48<dag outs, dag ins, string opcodestr, string argstr,               
>                list<dag> pattern, InstFormat format>                             
>     : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {              
>   field bits<48> Inst;                                                           
>   field bits<48> SoftFail = 0;                                                   
>   let Size = 6;                                                                  
> }                                                                                
>                                                                                  
> class RVInst64<dag outs, dag ins, string opcodestr, string argstr,               
>                list<dag> pattern, InstFormat format>                             
>     : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {              
>   field bits<64> Inst;                                                           
>   field bits<64> SoftFail = 0;                                                   
>   let Size = 8;                                                                  
> }
> ```

Right, but nonetheless, we will have the type specified per instruction *instance* and we will still need to validate for example that for all instructions with a particular size, the type string is same. To me that seems unnecessary duplication of this information and then additional verification to make sure that it's consistent. Also, unlike the size in bytes, which is a core property of the instruction, its C++ type to represent its bits in memory seems not a core property. Many backends seems to choose the same type (for example uint64_t) for all their 16/32/48/64 bit insts. Adoption wise as well, sticking it in the per-inst record seems more invasive (for example, in our and several other downstream backends the core instruction records are auto-generated so the adoption curve for this increases further).

https://github.com/llvm/llvm-project/pull/146593