[PATCH] Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
Hans Wennborg
hans at chromium.org
Mon Mar 30 18:13:41 PDT 2015
Some numbers from a Clang bootstrap at r233105.
The tree metric is (tree height)/(log2(nbr cases) + 1). Lower is better.
Without my patch
Nbr of jump tables: 4743
Bit tests: 5520
Tree metric avg/min/max: 0.467942 0.077705 2.182237
With my patch
Jump tables: 4910
Bit tests: 3689
Tree metric avg/min/max: 0.452684 0.077705 0.860977
It seems my patch is successfully finding more jump tables. It's finding a lot less bit tests though, so there might be something wrong there (unless the jump tables and bit tests it's finding are wider.. maybe I need a different metric).
Most importantly, the "tree metric" average is lower, and the max is much lower, which means the trees are more balanced.
I also compared binary size, and with my patch, the bootstrap is 20 kB smaller, presumably due to finding more jump tables.
For compile time, I used gcc-in-a-file.
Without my patch: (only one run, because it was slow and I was lazy)
$ time bin/clang -w -c -O3 /work/gcc.c -o /dev/null
real 2m15.618s
user 2m14.532s
sys 0m0.998s
With my patch:
$ time bin/clang -w -c -O3 /work/gcc.c -o /dev/null
real 2m18.549s
user 2m16.871s
sys 0m1.567s
It seems my patch is a little slower, but not by much.
For a more synthetic benchmark, I generated a file where f() contains a switch with 10k random cases, and main() then calls f() with each case value a thousand times: F446173: big.c.gz <http://reviews.llvm.org/F446173>
I used "perf stat -r10" for the tests below.
Without my patch:
clang -O3: 4.640746669 seconds ( +- 1.63% )
clang -O0: 1.240886991 ( +- 2.25% )
./a.out: 1.103107940 seconds ( +- 1.46% )
Tree metric: 8.608796 (no jump tables or bit tests)
With my patch:
clang -O3: 6.302967137 seconds ( +- 1.42% )
clang -O0: 0.778322722 seconds ( +- 2.59% )
./a.out: 0.729969454 seconds ( +- 2.18% )
Tree metric: 0.909873 (no jump tables or bit tests)
To summarize the above: on this benchmark, my patch is 36% slower in -O3, 38% faster in -O0, and generates a tree that's balanced, resulting in 35% faster code.
Moving forward, I need to take a proper look at the bit test finding code, and I'll experiment with capping the search range for jump tables to get the compile time down.
http://reviews.llvm.org/D8649
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list