[PATCH] D66369: [TableGen] Make MCInst decoding more table-driven

Mon May 16 14:43:47 PDT 2022

nlguillemot added a comment.

Here's what I tried to test the performance:

1. I made the following modification which deliberately exaggerates the performance cost of disassembly:

  diff --git a/llvm/tools/llvm-mc/Disassembler.cpp b/llvm/tools/llvm-mc/Disassembler.cpp
  index 16ab99548adf..1a584e4023c1 100644
  --- a/llvm/tools/llvm-mc/Disassembler.cpp
  +++ b/llvm/tools/llvm-mc/Disassembler.cpp
  @@ -46,7 +46,8 @@ static bool PrintInsts(const MCDisassembler &DisAsm,
       MCInst Inst;

       MCDisassembler::DecodeStatus S;
  -    S = DisAsm.getInstruction(Inst, Size, Data.slice(Index), Index, nulls());
  +    for (int i = 0; i < 200; i++)
  +      S = DisAsm.getInstruction(Inst, Size, Data.slice(Index), Index, nulls());
       switch (S) {
       case MCDisassembler::Fail:
         SM.PrintMessage(SMLoc::getFromPointer(Bytes.second[Index]),

2. I ran llvm-lit on the `llvm/test/MC/Disassembler` folder and measured the time it takes to run the tests. (Note: The modification above causes some tests to fail, but most of them still pass.)

I turned off all other apps on my computer and I tried to give time for my machine to cool down a bit between runs, so hopefully the measurement is fair and roughly stable.

The results of running llvm-lit on this folder before and after the patch are as follows.

Before:

  0m42.260s
  0m43.443s
  0m43.443s
  0m45.445s
  0m44.963s
  0m43.998s
  0m45.456s
  0m43.990s
  0m44.779s
  0m44.253s
  Average: 0m44.2031s

After:

  0m43.732s
  0m43.697s
  0m44.078s
  0m44.273s
  0m44.415s
  0m45.005s
  0m44.738s
  0m44.630s
  0m44.466s
  0m45.041s
  Average: 0m44.4075s

Based on these results it looks like there might be a relatively small (<1%) runtime regression.

I did some tests locally and found the performance could be improved more in two ways:

1. Try to use LEB128 less where possible. It's slow to decode LEB128.
2. Make specialized bit extractor functions for some common arrangements of bit extractor parameters, then dispatch to these specialized bit extractors by adding more enums to DecoderCodeletID. Improves performance a bit, but increases the complexity of the implementation, so I'm not sure if it's worth it since I think this code is usually not a bottleneck anyways.

Disclaimer: These measurements are from a long time ago, don't know if it would still come out the same way now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66369/new/

https://reviews.llvm.org/D66369