[PATCH] D66369: [TableGen] Make MCInst decoding more table-driven

Fri Aug 16 16:04:01 PDT 2019

nlguillemot created this revision.
nlguillemot added a reviewer: dsanders.
Herald added subscribers: steven.zhang, s.egerton, PkmX, atanasyan, simoncook, fedor.sergeev, kristof.beyls, arichardson, tpr, javed.absar, sdardis, jyknight.
Herald added a project: LLVM.

FixedLenDecoderEmitter used to emit plain C++ code to decode all the instructions. The problem with this is that it tends to generate hundreds or thousands of functions depending on the backend, and causes the compilation of the file to have problems due to the quantity of functions and the large size of the file.

This C++ code that it generated was highly redundant. For example:

- The same bits patterns were extracted by many different decoder functions.
- The same decoder methods were called in many places in the file.

The generated code was compressed by exploiting these redundancies.

- Sequences of identical bit extractions operation sequences were unified.
- Sequences of identical decoder method calls were unified.
- Identical sequences of bit extractions and decoder methods were unified.

Each decoder is then defined as a list of IDs of the operations above.

Every call to decodeToMCInst is made by a sequence of bit extractions and decoder method calls. Given the list of extraction operation IDs and the list of decoder method IDs used by a given decoder, we just need to know the sequence in which we should pop and execute either one extractor or one decoder method. By formalizing this, the original behavior of the FixedLenDecoderEmitter can be implemented by a state machine that executes a list of operations that either do a bit extractions or do a decoder method call. This makes decodeToMCInst work in a mostly data-driven way, so the large number of functions that caused the original performance problems are now mostly all gone, and what is left are constant arrays of integers, which are fast to compile.

To test the effect of this patch, the following test was done for every backend, before and after this patch:

1. Build everything with ninja.
2. Delete the built `BACKENDGenDisassemblerTables.inc` file (where BACKEND = the name of the backend)
3. Run ninja again with `time` and measure the time taken to rebuild.

The result of this experiment was that this patch generally reduces the compile time of the disassembler.
Some notable results are as follows. These are "real" times from `time`, rounded to 1 decimal digit:

- AArch64: 7.6 seconds -> 3.8 seconds
- AMGPU: 16.5 seconds -> 5.9 seconds
- ARM: 11.1 seconds -> 7.3 seconds
- Hexagon: 5.7 seconds -> 3.5 seconds
- Mips: 6.4 seconds -> 4.2 seconds

As far as the generated size, there are some wins and some losses, but it's a net reduction by 425K.
What follows is the diff of the size reported by `ls -lh` of each lib/libLLVMBACKENDDisassembler.a, where BACKEND is the given backend.

AArch64: 283K -> 205K
AMDGPU: 524K -> 245K
ARM: 430K -> 479K
BPF: 18K -> 22K
Hexagon: 184K -> 106K
Lanai: 17K -> 25K
MSP430: 25K -> 27K
Mips: 172K -> 147K
PowerPC: 80K -> 72K
RISCV: 36K -> 42K
Sparc: 40K -> 51K
SystemZ: 163K -> 91K
XCore: 45K -> 80K

Repository:
  rL LLVM

https://reviews.llvm.org/D66369

Files:
  test/TableGen/BitOffsetDecoder.td
  test/TableGen/FixedLenDecoderEmitter/InitValue.td
  test/TableGen/trydecode-emission.td
  test/TableGen/trydecode-emission2.td
  test/TableGen/trydecode-emission3.td
  utils/TableGen/FixedLenDecoderEmitter.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D66369.215686.patch
Type: text/x-patch
Size: 43085 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190816/f30bc426/attachment-0001.bin>