[PATCH] D20645: Avoid doing binary search.

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Wed May 25 14:11:10 PDT 2016


ruiu created this revision.
ruiu added reviewers: rafael, silvas.
ruiu added a subscriber: llvm-commits.

MergedInputSection::getOffset is the busiest function in LLD if
string merging is enabled and input files have lots of mergeable
sections. It is usually the case when creating executable with
debug info, so it is pretty common.

The reason why it is slow is because it has to do faily complex
computations. For non-mergeable sections, section contents are
contiguous in output, so in order to compute an output offset,
we only have to add the output section's base address to an input
offset. But for mergeable strings, section contents are split for
merging, so they are not contigous. We've got to do some lookups.

We used to do binary search on the list of section pieces.
It is slow because I think it's hostile to branch prediction.

This patch replaces it with hash table lookup. Seems it's working
pretty well. Below is "perf stat -r10" output when linking clang
with debug info. In this case this patch speeds up about 5%.

Before:

     6750.026013 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.34% )
             193 context-switches          #    0.029 K/sec                    ( +-  4.01% )
               2 cpu-migrations            #    0.000 K/sec                    ( +- 34.96% )
       1,082,310 page-faults               #    0.160 M/sec                    ( +-  0.34% )
  18,832,936,883 cycles                    #    2.790 GHz                      ( +-  0.34% )
  10,139,493,188 stalled-cycles-frontend   #   53.84% frontend cycles idle     ( +-  0.50% )
 <not supported> stalled-cycles-backend
  21,231,470,907 instructions              #    1.13  insns per cycle
                                           #    0.48  stalled cycles per insn  ( +-  0.15% )
   3,817,012,401 branches                  #  565.481 M/sec                    ( +-  0.14% )
     133,411,509 branch-misses             #    3.50% of all branches          ( +-  0.02% )

     6.763705580 seconds time elapsed                                          ( +-  0.47% )

After:

     6423.638562 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.49% )
             164 context-switches          #    0.025 K/sec                    ( +-  3.84% )
               5 cpu-migrations            #    0.001 K/sec                    ( +- 28.73% )
       1,289,077 page-faults               #    0.201 M/sec                    ( +-  0.29% )
  17,922,415,655 cycles                    #    2.790 GHz                      ( +-  0.49% )
  10,451,112,292 stalled-cycles-frontend   #   58.31% frontend cycles idle     ( +-  0.64% )
 <not supported> stalled-cycles-backend
  18,965,624,200 instructions              #    1.06  insns per cycle
                                           #    0.55  stalled cycles per insn  ( +-  0.29% )
   3,300,495,045 branches                  #  513.805 M/sec                    ( +-  0.29% )
      72,347,028 branch-misses             #    2.19% of all branches          ( +-  0.03% )

     6.418221804 seconds time elapsed                                          ( +-  0.49% )

http://reviews.llvm.org/D20645

Files:
  ELF/InputSection.cpp
  ELF/InputSection.h
  ELF/OutputSections.cpp
  ELF/OutputSections.h
  ELF/Writer.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D20645.58510.patch
Type: text/x-patch
Size: 5021 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160525/f56063e8/attachment.bin>


More information about the llvm-commits mailing list