[llvm] [GlobalISel] Change MatchTable entries to 1 byte each (PR #74429)

Tue Dec 12 09:06:21 PST 2023

aemerson wrote:

So I built clang with release mode and no assertions, with thinlto. I used fc791b612723 as the baseline and:
```
commit 44f6d94c58b8073432f52b368258414f9afd0ae4 (compact-gisel-match-table)
Author: pvanhout <pierre.vanhoutryve at amd.com>
Date:   Fri Dec 8 08:56:30 2023 +0100

    Simplify getEncodedEmitStr
```
as the test commit.

Building CTMark in the test suite with `-j1` and 10 runs, here's the data for -Os, with `compare.py` using `--merge-average`.
```
Program                                       compile_time
                                              before        after  diff
tramp3d-v4/tramp3d-v4                           5.37         5.42  0.8%
mafft/pairlocalalign                            2.51         2.53  0.5%
kimwitu++/kc                                    6.45         6.47  0.4%
ClamAV/clamscan                                 4.83         4.84  0.2%
sqlite3/sqlite3                                 2.35         2.35  0.1%
SPASS/SPASS                                     4.26         4.26  0.0%
lencod/lencod                                   4.39         4.39 -0.0%
Bullet/bullet                                  10.28        10.27 -0.1%
7zip/7zip-benchmark                            13.91        13.90 -0.1%
consumer-typeset/consumer-typeset               3.39         3.38 -0.3%
                           Geomean difference                      0.2%
```
So it seems there's a 0.2% geomean regression but I'm not fully convinced this isn't noise.

I decided to dig further by trying to run llc on a sqlite3 optimized bitcode file, to test just codegen performance and try to reduce the effect of noise from the rest of compilation/file IO. Doing this and measuring CPU cycles I saw a 0.5% mean regression.

I then looked at your PR here: https://github.com/llvm/llvm-project/pull/74823 and measuring this resulted in a no statistically significant change in CTMark overall compilation times, however I did see a 0.8% reduction in average cycles for the llc test, compared to the baseline. So it seems that PR brings a needed tiny perf improvement.

Overall, given that this change is a predecessor to that PR and that it does give a huge size improvement, I think this is ok to go.

https://github.com/llvm/llvm-project/pull/74429