[llvm] [TableGen][DecoderEmitter] Add option to emit type-specialized code (PR #146593)

Mon Aug 18 13:52:38 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-systemz

Author: Rahul Joshi (jurahul)

<details>
<summary>Changes</summary>

This change attempts to reduce the size of the disassembler code generated by DecoderEmitter.

Current state:

1. Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached in the while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. 
2. Several backends call `decodeInstruction` with different types, leading to several template instantiations of this function in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128 bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions.
3. Since there is just one `decodeToMCInst` generated, it has code that handles all instruction sizes. The decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is never exercised. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32 bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary.

With this change, the DecoderEmitter will stop generating a single templated `decodeInstruction` and will instead generate several overloaded versions of this function and the associated `decodeToMCInst` function as well. Instead of using the templated `InsnType`, it will use an auto-inferred type which can be either a standard C++ integrer type, APInt, or a std::bitset. As a results, decoders for 32-bit instructions will appear only in the 32-bit variant of `decodeToMCinst` and 64-bit decoders will appear only in 64-bit variant and that will fix the code duplication in the templated variant.

Additionally, the `DecodeIndex` will now be computed per-instruction bitwidth as instead of being computed globally across all bitwidths in the earlier case. So, the values will generally be smaller than before and hence will consume less bytes in their ULEB128 encoding in the decoder tables, resulting in further reduction in the size of the decode tables.

Since this non-templated decoder also needs some changes in the C++ code, added an option `GenerateTemplatedDecoder` to `InstrInfo` that is defaulted to false, but targets can set to true to fall back to using templated code. The goal is to migrate all targets to use non-templated decoder and deprecate this option in future.

Adopt this feature for the AMDGPU backend. In a release build, this results in a net 35% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. Actual numbers measured locally for a Linux x86_64 build using clang-18.1.3 toolchain are:

```
.text 378780 -> 244684, i.e., a 35% reduction in size
.rodata 352648 -> 334960 i.e., a 5% reduction in size
```

For targets that do not use multiple instantiations of `decodeInstruction`, opting in into this feature may not result in code/data size improvement but potential compile time improvements by avoiding the use of templated code.

---

Patch is 59.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146593.diff


21 Files Affected:

- (modified) llvm/include/llvm/Target/Target.td (+6-1) 
- (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+15-16) 
- (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h (-38) 
- (modified) llvm/lib/Target/ARC/ARC.td (+4-1) 
- (modified) llvm/lib/Target/AVR/AVR.td (+4-1) 
- (modified) llvm/lib/Target/CSKY/CSKY.td (+4-1) 
- (modified) llvm/lib/Target/MSP430/MSP430.td (+4-1) 
- (modified) llvm/lib/Target/Mips/Mips.td (+2) 
- (modified) llvm/lib/Target/PowerPC/PPC.td (+2) 
- (modified) llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp (+5-8) 
- (modified) llvm/lib/Target/SystemZ/SystemZ.td (+5-1) 
- (modified) llvm/lib/Target/Xtensa/Xtensa.td (+4-1) 
- (modified) llvm/test/TableGen/BitOffsetDecoder.td (+5-5) 
- (modified) llvm/test/TableGen/DecoderEmitterFnTable.td (+10-10) 
- (modified) llvm/test/TableGen/FixedLenDecoderEmitter/InitValue.td (+3-3) 
- (modified) llvm/test/TableGen/HwModeEncodeDecode3.td (+12-7) 
- (modified) llvm/test/TableGen/VarLenDecoder.td (+21-16) 
- (modified) llvm/test/TableGen/trydecode-emission.td (+14-12) 
- (modified) llvm/test/TableGen/trydecode-emission2.td (+4-4) 
- (modified) llvm/test/TableGen/trydecode-emission4.td (+2-2) 
- (modified) llvm/utils/TableGen/DecoderEmitter.cpp (+426-135) 


``````````diff

diff --git a/llvm/include/llvm/Target/Target.td b/llvm/include/llvm/Target/Target.td
index 495b59ee916cf..2fbd9a836d1c0 100644
--- a/llvm/include/llvm/Target/Target.td
+++ b/llvm/include/llvm/Target/Target.td
@@ -1137,7 +1137,6 @@ class OptionalDefOperand<ValueType ty, dag OpTypes, dag defaultops>
   let MIOperandInfo = OpTypes;
 }
 
-
 // InstrInfo - This class should only be instantiated once to provide parameters
 // which are global to the target machine.
 //
@@ -1158,6 +1157,12 @@ class InstrInfo {
   //
   // This option is a temporary migration help. It will go away.
   bit guessInstructionProperties = true;
+
+  // Option to choose bewteen templated and non-templated code from decoder
+  // emitter. This means that the generated `decodeInstruction` function will
+  // use auto-inferred types for the instruction payload instead of generating
+  // templated code using `InsnType` for the instruction payload.
+  bit GenerateTemplatedDecoder = false;
 }
 
 // Standard Pseudo Instructions.
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index fb7d634e62272..68f645195ac34 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -35,6 +35,7 @@
 #include "llvm/MC/TargetRegistry.h"
 #include "llvm/Support/AMDHSAKernelDescriptor.h"
 #include "llvm/Support/Compiler.h"
+#include <bitset>
 
 using namespace llvm;
 
@@ -497,26 +498,24 @@ template <typename T> static inline T eatBytes(ArrayRef<uint8_t>& Bytes) {
   return Res;
 }
 
-static inline DecoderUInt128 eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<96> eat12Bytes(ArrayRef<uint8_t> &Bytes) {
+  using namespace llvm::support::endian;
   assert(Bytes.size() >= 12);
-  uint64_t Lo =
-      support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+  std::bitset<96> Lo(read<uint64_t, endianness::little>(Bytes.data()));
   Bytes = Bytes.slice(8);
-  uint64_t Hi =
-      support::endian::read<uint32_t, llvm::endianness::little>(Bytes.data());
+  std::bitset<96> Hi(read<uint32_t, endianness::little>(Bytes.data()));
   Bytes = Bytes.slice(4);
-  return DecoderUInt128(Lo, Hi);
+  return (Hi << 64) | Lo;
 }
 
-static inline DecoderUInt128 eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+static inline std::bitset<128> eat16Bytes(ArrayRef<uint8_t> &Bytes) {
+  using namespace llvm::support::endian;
   assert(Bytes.size() >= 16);
-  uint64_t Lo =
-      support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+  std::bitset<128> Lo(read<uint64_t, endianness::little>(Bytes.data()));
   Bytes = Bytes.slice(8);
-  uint64_t Hi =
-      support::endian::read<uint64_t, llvm::endianness::little>(Bytes.data());
+  std::bitset<128> Hi(read<uint64_t, endianness::little>(Bytes.data()));
   Bytes = Bytes.slice(8);
-  return DecoderUInt128(Lo, Hi);
+  return (Hi << 64) | Lo;
 }
 
 void AMDGPUDisassembler::decodeImmOperands(MCInst &MI,
@@ -599,14 +598,14 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
     // Try to decode DPP and SDWA first to solve conflict with VOP1 and VOP2
     // encodings
     if (isGFX1250() && Bytes.size() >= 16) {
-      DecoderUInt128 DecW = eat16Bytes(Bytes);
+      std::bitset<128> DecW = eat16Bytes(Bytes);
       if (tryDecodeInst(DecoderTableGFX1250128, MI, DecW, Address, CS))
         break;
       Bytes = Bytes_.slice(0, MaxInstBytesNum);
     }
 
-    if (isGFX11Plus() && Bytes.size() >= 12 ) {
-      DecoderUInt128 DecW = eat12Bytes(Bytes);
+    if (isGFX11Plus() && Bytes.size() >= 12) {
+      std::bitset<96> DecW = eat12Bytes(Bytes);
 
       if (isGFX11() &&
           tryDecodeInst(DecoderTableGFX1196, DecoderTableGFX11_FAKE1696, MI,
@@ -641,7 +640,7 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
 
     } else if (Bytes.size() >= 16 &&
                STI.hasFeature(AMDGPU::FeatureGFX950Insts)) {
-      DecoderUInt128 DecW = eat16Bytes(Bytes);
+      std::bitset<128> DecW = eat16Bytes(Bytes);
       if (tryDecodeInst(DecoderTableGFX940128, MI, DecW, Address, CS))
         break;
 
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
index f4d164bf10c3c..ded447b6f8d5a 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h
@@ -32,44 +32,6 @@ class MCOperand;
 class MCSubtargetInfo;
 class Twine;
 
-// Exposes an interface expected by autogenerated code in
-// FixedLenDecoderEmitter
-class DecoderUInt128 {
-private:
-  uint64_t Lo = 0;
-  uint64_t Hi = 0;
-
-public:
-  DecoderUInt128() = default;
-  DecoderUInt128(uint64_t Lo, uint64_t Hi = 0) : Lo(Lo), Hi(Hi) {}
-  operator bool() const { return Lo || Hi; }
-  uint64_t extractBitsAsZExtValue(unsigned NumBits,
-                                  unsigned BitPosition) const {
-    assert(NumBits && NumBits <= 64);
-    assert(BitPosition < 128);
-    uint64_t Val;
-    if (BitPosition < 64)
-      Val = Lo >> BitPosition | Hi << 1 << (63 - BitPosition);
-    else
-      Val = Hi >> (BitPosition - 64);
-    return Val & ((uint64_t(2) << (NumBits - 1)) - 1);
-  }
-  DecoderUInt128 operator&(const DecoderUInt128 &RHS) const {
-    return DecoderUInt128(Lo & RHS.Lo, Hi & RHS.Hi);
-  }
-  DecoderUInt128 operator&(const uint64_t &RHS) const {
-    return *this & DecoderUInt128(RHS);
-  }
-  DecoderUInt128 operator~() const { return DecoderUInt128(~Lo, ~Hi); }
-  bool operator==(const DecoderUInt128 &RHS) {
-    return Lo == RHS.Lo && Hi == RHS.Hi;
-  }
-  bool operator!=(const DecoderUInt128 &RHS) {
-    return Lo != RHS.Lo || Hi != RHS.Hi;
-  }
-  bool operator!=(const int &RHS) { return *this != DecoderUInt128(RHS); }
-};
-
 //===----------------------------------------------------------------------===//
 // AMDGPUDisassembler
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/ARC/ARC.td b/llvm/lib/Target/ARC/ARC.td
index 142ce7f747919..989dc53794ae6 100644
--- a/llvm/lib/Target/ARC/ARC.td
+++ b/llvm/lib/Target/ARC/ARC.td
@@ -24,7 +24,10 @@ include "ARCRegisterInfo.td"
 include "ARCInstrInfo.td"
 include "ARCCallingConv.td"
 
-def ARCInstrInfo : InstrInfo;
+def ARCInstrInfo : InstrInfo {
+  // FIXME: Migrate ARC disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
+}
 
 class Proc<string Name, list<SubtargetFeature> Features>
  : Processor<Name, NoItineraries, Features>;
diff --git a/llvm/lib/Target/AVR/AVR.td b/llvm/lib/Target/AVR/AVR.td
index 22ffc4a368ad6..dec1925b34035 100644
--- a/llvm/lib/Target/AVR/AVR.td
+++ b/llvm/lib/Target/AVR/AVR.td
@@ -32,7 +32,10 @@ include "AVRRegisterInfo.td"
 
 include "AVRInstrInfo.td"
 
-def AVRInstrInfo : InstrInfo;
+def AVRInstrInfo : InstrInfo {
+  // FIXME: Migrate AVR disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
+}
 
 //===---------------------------------------------------------------------===//
 // Calling Conventions
diff --git a/llvm/lib/Target/CSKY/CSKY.td b/llvm/lib/Target/CSKY/CSKY.td
index b5df93a9d464c..3d3d8dbe8bbfa 100644
--- a/llvm/lib/Target/CSKY/CSKY.td
+++ b/llvm/lib/Target/CSKY/CSKY.td
@@ -671,7 +671,10 @@ def : CK860V<"ck860fv", NoSchedModel,
 // Define the CSKY target.
 //===----------------------------------------------------------------------===//
 
-def CSKYInstrInfo : InstrInfo;
+def CSKYInstrInfo : InstrInfo {
+  // FIXME: Migrate CSKY disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
+}
 
 
 def CSKYAsmParser : AsmParser {
diff --git a/llvm/lib/Target/MSP430/MSP430.td b/llvm/lib/Target/MSP430/MSP430.td
index 38aa30fcf4dd1..d6d569ce33204 100644
--- a/llvm/lib/Target/MSP430/MSP430.td
+++ b/llvm/lib/Target/MSP430/MSP430.td
@@ -61,7 +61,10 @@ include "MSP430CallingConv.td"
 
 include "MSP430InstrInfo.td"
 
-def MSP430InstrInfo : InstrInfo;
+def MSP430InstrInfo : InstrInfo {
+  // FIXME: Migrate MPS430 disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
+}
 
 //===---------------------------------------------------------------------===//
 // Assembly Printers
diff --git a/llvm/lib/Target/Mips/Mips.td b/llvm/lib/Target/Mips/Mips.td
index b346ba95f5984..49bec7e02e4b2 100644
--- a/llvm/lib/Target/Mips/Mips.td
+++ b/llvm/lib/Target/Mips/Mips.td
@@ -229,6 +229,8 @@ include "MipsScheduleP5600.td"
 include "MipsScheduleGeneric.td"
 
 def MipsInstrInfo : InstrInfo {
+  // FIXME: Migrate MIPS disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/PowerPC/PPC.td b/llvm/lib/Target/PowerPC/PPC.td
index ea7c2203662bd..54b2ff6b3ef05 100644
--- a/llvm/lib/Target/PowerPC/PPC.td
+++ b/llvm/lib/Target/PowerPC/PPC.td
@@ -721,6 +721,8 @@ include "PPCCallingConv.td"
 
 def PPCInstrInfo : InstrInfo {
   let isLittleEndianEncoding = 1;
+  // FIXME: Migrate PPC disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
 }
 
 def PPCAsmWriter : AsmWriter {
diff --git a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
index 78be55b3a51d3..8c71cd097b8c9 100644
--- a/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
+++ b/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
@@ -712,9 +712,7 @@ DecodeStatus RISCVDisassembler::getInstruction32(MCInst &MI, uint64_t &Size,
   }
   Size = 4;
 
-  // Use uint64_t to match getInstruction48. decodeInstruction is templated
-  // on the Insn type.
-  uint64_t Insn = support::endian::read32le(Bytes.data());
+  uint32_t Insn = support::endian::read32le(Bytes.data());
 
   for (const DecoderListEntry &Entry : DecoderList32) {
     if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -760,9 +758,7 @@ DecodeStatus RISCVDisassembler::getInstruction16(MCInst &MI, uint64_t &Size,
   }
   Size = 2;
 
-  // Use uint64_t to match getInstruction48. decodeInstruction is templated
-  // on the Insn type.
-  uint64_t Insn = support::endian::read16le(Bytes.data());
+  uint16_t Insn = support::endian::read16le(Bytes.data());
 
   for (const DecoderListEntry &Entry : DecoderList16) {
     if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
@@ -796,9 +792,10 @@ DecodeStatus RISCVDisassembler::getInstruction48(MCInst &MI, uint64_t &Size,
   }
   Size = 6;
 
-  uint64_t Insn = 0;
+  uint64_t InsnBits = 0;
   for (size_t i = Size; i-- != 0;)
-    Insn += (static_cast<uint64_t>(Bytes[i]) << 8 * i);
+    InsnBits += (static_cast<uint64_t>(Bytes[i]) << 8 * i);
+  std::bitset<48> Insn(InsnBits);
 
   for (const DecoderListEntry &Entry : DecoderList48) {
     if (!Entry.haveContainedFeatures(STI.getFeatureBits()))
diff --git a/llvm/lib/Target/SystemZ/SystemZ.td b/llvm/lib/Target/SystemZ/SystemZ.td
index ec110645c62dd..9961eee895326 100644
--- a/llvm/lib/Target/SystemZ/SystemZ.td
+++ b/llvm/lib/Target/SystemZ/SystemZ.td
@@ -57,7 +57,11 @@ include "SystemZInstrHFP.td"
 include "SystemZInstrDFP.td"
 include "SystemZInstrSystem.td"
 
-def SystemZInstrInfo : InstrInfo { let guessInstructionProperties = 0; }
+def SystemZInstrInfo : InstrInfo {
+  let guessInstructionProperties = 0;
+  // FIXME: Migrate SystemZ disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
+}
 
 //===----------------------------------------------------------------------===//
 // Assembly parser
diff --git a/llvm/lib/Target/Xtensa/Xtensa.td b/llvm/lib/Target/Xtensa/Xtensa.td
index 4ef885e19101e..d0e7e349bb708 100644
--- a/llvm/lib/Target/Xtensa/Xtensa.td
+++ b/llvm/lib/Target/Xtensa/Xtensa.td
@@ -44,7 +44,10 @@ include "XtensaCallingConv.td"
 
 include "XtensaInstrInfo.td"
 
-def XtensaInstrInfo : InstrInfo;
+def XtensaInstrInfo : InstrInfo {
+  // FIXME: Migrate XTensa disassembler to work with non-templated decoder.
+  let GenerateTemplatedDecoder = true;
+}
 
 //===----------------------------------------------------------------------===//
 // Target Declaration
diff --git a/llvm/test/TableGen/BitOffsetDecoder.td b/llvm/test/TableGen/BitOffsetDecoder.td
index 04d6e164d0eee..63b874ace85d5 100644
--- a/llvm/test/TableGen/BitOffsetDecoder.td
+++ b/llvm/test/TableGen/BitOffsetDecoder.td
@@ -57,8 +57,8 @@ def baz : Instruction {
 
 }
 
-// CHECK: tmp = fieldFromInstruction(insn, 8, 7);
-// CHECK: tmp = fieldFromInstruction(insn, 8, 8) << 3;
-// CHECK: insertBits(tmp, fieldFromInstruction(insn, 8, 4), 7, 4);
-// CHECK: insertBits(tmp, fieldFromInstruction(insn, 12, 4), 3, 4);
-// CHECK: tmp = fieldFromInstruction(insn, 8, 8) << 4;
+// CHECK: tmp = fieldFromInstruction(Insn, 8, 7);
+// CHECK: tmp = fieldFromInstruction(Insn, 8, 8) << 3;
+// CHECK: insertBits(tmp, fieldFromInstruction(Insn, 8, 4), 7, 4);
+// CHECK: insertBits(tmp, fieldFromInstruction(Insn, 12, 4), 3, 4);
+// CHECK: tmp = fieldFromInstruction(Insn, 8, 8) << 4;
diff --git a/llvm/test/TableGen/DecoderEmitterFnTable.td b/llvm/test/TableGen/DecoderEmitterFnTable.td
index 7bed18c19a513..a08ed66ecbabc 100644
--- a/llvm/test/TableGen/DecoderEmitterFnTable.td
+++ b/llvm/test/TableGen/DecoderEmitterFnTable.td
@@ -71,14 +71,14 @@ def Inst3 : TestInstruction {
   let AsmString = "Inst3";
 }
 
-// CHECK-LABEL: DecodeStatus decodeFn0(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn1(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn2(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: DecodeStatus decodeFn3(DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
-// CHECK-LABEL: decodeToMCInst(unsigned Idx, DecodeStatus S, InsnType insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_0(DecodeStatus S, uint8_t Insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_1(DecodeStatus S, uint8_t Insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_2(DecodeStatus S, uint8_t Insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: DecodeStatus decodeFn_3(DecodeStatus S, uint8_t Insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
+// CHECK-LABEL: decodeToMCInst(unsigned Idx, DecodeStatus S, uint8_t Insn, MCInst &MI, uint64_t Address, const MCDisassembler *Decoder, bool &DecodeComplete)
 // CHECK: static constexpr DecodeFnTy decodeFnTable[]
-// CHECK-NEXT: decodeFn0,
-// CHECK-NEXT: decodeFn1,
-// CHECK-NEXT: decodeFn2,
-// CHECK-NEXT: decodeFn3,
-// CHECK: return decodeFnTable[Idx](S, insn, MI, Address, Decoder, DecodeComplete)
+// CHECK-NEXT: decodeFn_0,
+// CHECK-NEXT: decodeFn_1,
+// CHECK-NEXT: decodeFn_2,
+// CHECK-NEXT: decodeFn_3,
+// CHECK: return decodeFnTable[Idx](S, Insn, MI, Address, Decoder, DecodeComplete)
diff --git a/llvm/test/TableGen/FixedLenDecoderEmitter/InitValue.td b/llvm/test/TableGen/FixedLenDecoderEmitter/InitValue.td
index 03847439ffc2e..5dc91b0ebedb1 100644
--- a/llvm/test/TableGen/FixedLenDecoderEmitter/InitValue.td
+++ b/llvm/test/TableGen/FixedLenDecoderEmitter/InitValue.td
@@ -39,8 +39,8 @@ def bax : Instruction {
 
 }
 
-// CHECK: tmp = fieldFromInstruction(insn, 9, 7) << 1;
+// CHECK: tmp = fieldFromInstruction(Insn, 9, 7) << 1;
 // CHECK: tmp = 0x1;
-// CHECK: insertBits(tmp, fieldFromInstruction(insn, 9, 7), 1, 7);
+// CHECK: insertBits(tmp, fieldFromInstruction(Insn, 9, 7), 1, 7);
 // CHECK: tmp = 0x100000000;
-// CHECK: insertBits(tmp, fieldFromInstruction(insn, 8, 7), 25, 7);
+// CHECK: insertBits(tmp, fieldFromInstruction(Insn, 8, 7), 25, 7);
diff --git a/llvm/test/TableGen/HwModeEncodeDecode3.td b/llvm/test/TableGen/HwModeEncodeDecode3.td
index c4d488d9d5f8f..d9172b0384e8c 100644
--- a/llvm/test/TableGen/HwModeEncodeDecode3.td
+++ b/llvm/test/TableGen/HwModeEncodeDecode3.td
@@ -118,8 +118,6 @@ def unrelated: Instruction {
 // exact duplicates and could effectively be merged into one.
 // DECODER-LABEL: DecoderTable32[] =
 // DECODER-DAG: Opcode: bar
-// DECODER-LABEL: DecoderTable64[] =
-// DECODER-DAG: Opcode: fooTypeEncDefault:foo
 // DECODER-LABEL: DecoderTableAlt32[] =
 // DECODER-DAG: Opcode: unrelated
 // DECODER-LABEL: DecoderTableAlt_ModeA32[] =
@@ -138,13 +136,13 @@ def unrelated: Instruction {
 // DECODER-LABEL: DecoderTable_ModeC32[] =
 // DECODER-DAG: Opcode: fooTypeEncC:foo
 // DECODER-DAG: Opcode: bar
+// DECODER-LABEL: DecoderTable64[] =
+// DECODER-DAG: Opcode: fooTypeEncDefault:foo
 
 // Under the 'O1' optimization level, unnecessary duplicate tables will be eliminated,
 // reducing the four ‘Alt’ tables down to just one.
 // DECODER-SUPPRESS-O1-LABEL: DecoderTable32[] =
 // DECODER-SUPPRESS-O1-DAG: Opcode: bar
-// DECODER-SUPPRESS-O1-LABEL: DecoderTable64[] =
-// DECODER-SUPPRESS-O1-DAG: Opcode: fooTypeEncDefault:foo
 // DECODER-SUPPRESS-O1-LABEL: DecoderTableAlt32[] =
 // DECODER-SUPPRESS-O1-DAG: Opcode: unrelated
 // DECODER-SUPPRESS-O1-LABEL: DecoderTable_ModeA32[] =
@@ -157,6 +155,8 @@ def unrelated: Instruction {
 // DECODER-SUPPRESS-O1-LABEL: DecoderTable_ModeC32[] =
 // DECODER-SUPPRESS-O1-DAG: Opcode: fooTypeEncC:foo
 // DECODER-SUPPRESS-O1-DAG: Opcode: bar
+// DECODER-SUPPRESS-O1-LABEL: DecoderTable64[] =
+// DECODER-SUPPRESS-O1-DAG: Opcode: fooTypeEncDefault:foo
 
 // Under the 'O2' optimization condition, instructions possessing the 'EncodingByHwMode'
 // attribute will be extracted from their original DecoderNamespace and placed into their
@@ -164,11 +164,13 @@ def unrelated: Instruction {
 // attribute but are within the same DecoderNamespace will be stored in the 'Default' table. This
 // approach will significantly reduce instruction redundancy, but it necessitates users to thoroughly
 // consider the interplay between HwMode and DecoderNamespace for their instructions.
+//
+// Additionally, no 32-bit instruction will appear in a 64-bit decoder table and
+// vice-versa.
+//
 // DECODER-SUPPRESS-O2-LABEL: DecoderTable32[] =
 // DECODER-SUPPRESS-O2-DAG: Opcode: bar
-// DECODER-SUPPRESS-O2-LABEL: DecoderTable64[] =
-// DECODER-SUPPRESS-O2-NOT: Opcode: bar
-// DECODER-SUPPRESS-O2-DAG: Opcode: fooTypeEncDefault:foo
+// DECODER-SUPPRESS-O2-NOT: Opcode: fooTypeEncDefault:foo
 // DECODER-SUPPRESS-O2-LABEL: DecoderTableAlt32[] =
 // DECODER-SUPPRESS-O2-DAG: Opcode: unrelated
 // DECODER-SUPPRESS-O2-LABEL: DecoderTable_ModeA32[] =
@@ -181,6 +183,9 @@ def unrelated: Instruction {
 // DECODER-SUPPRESS-O2-LABEL: DecoderTable_ModeC32[] =
 // DECODER-SUPPRESS-O2-DAG: Opcode: fooTypeEncC:foo
 // DECODER-SUPPRESS-O2-NOT: Opcode: bar
+// DECODER-SUPPRESS-O2-LABEL: DecoderTable64[] =
+// DECODER-SUPPRESS-O2-DAG: Opcode: fooTypeEncDefault:foo
+// DECODER-SUPPRESS-O2-NOT: Opcode: bar
 
 // For 'bar' and 'unrelated', we didn't assign any HwModes for them,
 // they should keep the same in the following four tables.
diff --git a/llvm/test/TableGen/VarLenDecoder.td b/llvm/test/TableGen/VarLenDecoder.td
index 06ff62294a196..6b792efc81f8f 100644
--- a/llvm/test/TableGen/VarLenDecoder.td
+++ b/llvm/test/TableGen/VarLenDecoder.td
@@ -3,7 +3,9 @@
 
 include "llvm/Target/Target.td"
 
-def ArchInstrInfo : InstrInfo { }
+def ArchInstrInfo : InstrInfo {
+  let GenerateTemplatedDecoder = false;
+}
 
 def Arch : Target {
   let InstructionSet = ArchInstrInfo;
@@ -47,6 +49,12 @@ def FOO32 : MyVarInst<MemOp32> {
   );
 }
 
+// Instruction length table
+// CHECK-LABEL: InstrLenTable
+// CHECK: 27,
+// CHECK-NEXT: 43,
+// CHECK-NEXT: };
+
 // CHECK-SMALL:      /* 0 */       MCD::OPC_ExtractField, 3, 5,  // Inst{7-3} ...
 // CHECK-SMALL-NEXT: /* 3 */       MCD::OPC_FilterValue, 8, 4, 0, // Skip to: 11
 // CHECK-SMALL-NEXT: /* 7 */       MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 0, // Opcode: FOO16
@@ -61,37 +69,34 @@ def FOO32 : MyVarInst<MemOp32> {
 // CHECK-LARGE-NEXT: /* 14 */      MCD::OPC_Decode, {{[0-9]+}}, {{[0-9]+}}, 1, // ...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/146593