[PATCH] Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)

Mon Mar 30 18:13:41 PDT 2015

Some numbers from a Clang bootstrap at r233105.

The tree metric is (tree height)/(log2(nbr cases) + 1). Lower is better.

Without my patch
Nbr of jump tables: 4743
Bit tests: 5520
Tree metric avg/min/max: 0.467942 0.077705 2.182237

With my patch
Jump tables: 4910
Bit tests: 3689
Tree metric avg/min/max: 0.452684 0.077705 0.860977

It seems my patch is successfully finding more jump tables. It's finding a lot less bit tests though, so there might be something wrong there (unless the jump tables and bit tests it's finding are wider.. maybe I need a different metric).

Most importantly, the "tree metric" average is lower, and the max is much lower, which means the trees are more balanced.

I also compared binary size, and with my patch, the bootstrap is 20 kB smaller, presumably due to finding more jump tables.

For compile time, I used gcc-in-a-file.

Without my patch: (only one run, because it was slow and I was lazy)
$ time bin/clang -w -c -O3 /work/gcc.c -o /dev/null
real    2m15.618s
user    2m14.532s
sys     0m0.998s

With my patch:
$ time bin/clang -w -c -O3 /work/gcc.c -o /dev/null
real    2m18.549s
user    2m16.871s
sys     0m1.567s

It seems my patch is a little slower, but not by much.

For a more synthetic benchmark, I generated a file where f() contains a switch with 10k random cases, and main() then calls f() with each case value a thousand times: F446173: big.c.gz <http://reviews.llvm.org/F446173>

I used "perf stat -r10" for the tests below.

Without my patch:
clang -O3: 4.640746669 seconds ( +-  1.63% )
clang -O0: 1.240886991 ( +-  2.25% )
./a.out: 1.103107940 seconds ( +-  1.46% )
Tree metric: 8.608796 (no jump tables or bit tests)

With my patch:
clang -O3: 6.302967137 seconds ( +-  1.42% )
clang -O0: 0.778322722 seconds  ( +-  2.59% )
./a.out: 0.729969454 seconds ( +-  2.18% )
Tree metric: 0.909873 (no jump tables or bit tests)

To summarize the above: on this benchmark, my patch is 36% slower in -O3, 38% faster in -O0, and generates a tree that's balanced, resulting in 35% faster code.

Moving forward, I need to take a proper look at the bit test finding code, and I'll experiment with capping the search range for jump tables to get the compile time down.

http://reviews.llvm.org/D8649

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/