[llvm] [LLVM][DecoderEmitter] Add option to use lambdas in decodeToMCInst (PR #144814)

Sat Jun 21 10:14:46 PDT 2025

jurahul wrote:

I did a couple of things here after converting the PR to create static functions.

(a) I measured the compile time as reported by clang   on our downstream code (I am compiling using clang-18 as that's what I have installed. It seems in the switch version, the 3 worst offenders for compile time are:

```
  663.5226 ( 82.0%)   0.0003 (  0.0%)  663.5229 ( 81.8%)  663.5452 ( 81.8%)  Two-Address instruction pass
  149.9866 ( 47.4%)   0.0008 (  1.3%)  149.9874 ( 47.4%)  149.9874 ( 47.4%)  SimplifyCFGPass
  120.5514 ( 38.1%)   0.0103 ( 17.3%)  120.5618 ( 38.1%)  120.6282 ( 38.1%)  SROAPass
```
Unfortunately, it's not possible for me to file a bug report with the input as it contains non-public stuff. It may be worth seeing if a new version of clang improves things, but in any case, we need to support tools that folks might use, so we will still need this fix for improved build times.

(b) I setup some profiling code in llvm-mc that will try to disassemble each byte pattern some large number of times and profile the loop using the `TimeTraceScope` API and ran it with 3 AMDGPU llvm-mc unit tests, and I see the following (first run with function pointers, second run with switch). It seems the switch-case version is actually slower than the function pointer version. It would be measurement noise, but the signal seems consistent. I can put the changes to do the measurement as a separate PR (not for committing, but as a reference). 

```
$ source time.sh 
                "avg ms": 539
                "avg ms": 604
                "avg ms": 219
$ source time.sh
                "avg ms": 571
                "avg ms": 632
                "avg ms": 231
```

https://github.com/llvm/llvm-project/pull/144814