[PATCH] Switch lowering: use profile info to build weight-balanced binary search trees

Hans Wennborg hans at chromium.org
Mon Apr 27 19:17:52 PDT 2015


Hi djasper,

This patch uses profile info to balance the binary search trees used for switch lowering by weight instead of node count, causing hot nodes to appear closer to the root.

I've been using this benchmark: {F484746}. It's a 512-case switch. A loop exercises the cases in a random sequence, hitting the first case once, the second twice, and so on, making the 512th case the hottest. I ran it like this:

```
$ bin/clang -O3 switch_with_profile.c 
$ perf stat -r5 ./a.out                                                                                                                                                                                                               

 Performance counter stats for './a.out' (5 runs):

       2456.382229 task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.63% )
               403 context-switches          #    0.164 K/sec                    ( +-  3.93% )
                12 cpu-migrations            #    0.005 K/sec                    ( +- 32.02% )
               131 page-faults               #    0.053 K/sec                  
     7,961,322,884 cycles                    #    3.241 GHz                      ( +-  0.07% ) [66.68%]
     2,511,064,458 stalled-cycles-frontend   #   31.54% frontend cycles idle     ( +-  0.06% ) [50.05%]
     1,874,963,880 stalled-cycles-backend    #   23.55% backend  cycles idle     ( +-  0.05% ) [50.14%]
    10,579,662,099 instructions              #    1.33  insns per cycle        
                                             #    0.24  stalled cycles per insn  ( +-  0.08% ) [66.80%]
     4,910,929,143 branches                  # 1999.253 M/sec                    ( +-  0.05% ) [66.63%]
        14,888,827 branch-misses             #    0.30% of all branches          ( +-  0.07% ) [66.58%]

       2.460022963 seconds time elapsed                                          ( +-  0.64% )

$ bin/clang -O3 switch_with_profile.c -fprofile-instr-generate
$ LLVM_PROFILE_FILE=profile.raw ./a.out
$ bin/llvm-profdata merge -output=profile.profdata profile.raw
$ bin/clang -O3 switch_with_profile.c -fprofile-instr-use=profile.profdata
$ perf stat -r5 ./a.out

 Performance counter stats for './a.out' (5 runs):

       2156.926282 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.84% )
               348 context-switches          #    0.161 K/sec                    ( +-  3.07% )
                 7 cpu-migrations            #    0.003 K/sec                    ( +- 39.10% )
               131 page-faults               #    0.061 K/sec                    ( +-  0.15% )
     7,269,065,426 cycles                    #    3.370 GHz                      ( +-  0.05% ) [66.56%]
     2,183,070,540 stalled-cycles-frontend   #   30.03% frontend cycles idle     ( +-  0.27% ) [50.00%]
     1,601,430,175 stalled-cycles-backend    #   22.03% backend  cycles idle     ( +-  0.37% ) [50.21%]
    10,328,446,596 instructions              #    1.42  insns per cycle        
                                             #    0.21  stalled cycles per insn  ( +-  0.08% ) [66.91%]
     4,777,439,719 branches                  # 2214.930 M/sec                    ( +-  0.05% ) [66.75%]
        14,804,172 branch-misses             #    0.31% of all branches          ( +-  0.06% ) [66.59%]

       2.160195042 seconds time elapsed                                          ( +-  0.84% )
```

That's a 12 % speed-up, which I think is good given that the switch's profile is tricky and not completely dominated by one or a few cases.

Note that this also affects non-profile guided builds. When there is no profile info, each case has the same weight. When clustered together in e.g. a jump table, that cluster will be heavier than a single-case cluster, which affects the tree layout.

http://reviews.llvm.org/D9318

Files:
  lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  test/CodeGen/Generic/MachineBranchProb.ll
  test/CodeGen/X86/switch.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D9318.24527.patch
Type: text/x-patch
Size: 6490 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150428/6c29e6d0/attachment.bin>


More information about the llvm-commits mailing list