[PATCH] D27155: Merge strings using concurrent hash map (3rd try!)

Sun Nov 27 16:32:23 PST 2016

ruiu created this revision.
ruiu added a reviewer: silvas.
ruiu added a subscriber: llvm-commits.

Here is yet another different implementation of string merging algorithm.
And this is faster than the previous two (https://reviews.llvm.org/D27146,
https://reviews.llvm.org/D27152).

ParallelStringTableBuilder implemented in this patch is a concurrent
hash table specialized for string table creation. It doesn't support
resizing, and you cannot do anything other than inserting strings into
the builder and write the string down to a buffer. By limiting use case,
a concurrent hash table can be implemented very easily. (Generally it
is extermeley hard.)

Note that this algorithm creates optimized string table, unlike
the probabilistic one (modulo a small difference due to alignment.)

However, the result is not deterministic, because multiple threads
add strings simultaneously, and the order in which strings are added
is out of control.

Here is the performance number. This is better than the probabilistic
algorithm (5.227 seconds) and the sharded hash table algorithm (5.666
seconds).

  Before:

     36427.671361 task-clock (msec)         #    5.477 CPUs utilized            ( +-  1.34% )
          158,095 context-switches          #    0.004 M/sec                    ( +-  0.27% )
            6,165 cpu-migrations            #    0.169 K/sec                    ( +- 21.57% )
        2,365,415 page-faults               #    0.065 M/sec                    ( +-  0.18% )
  100,831,590,020 cycles                    #    2.768 GHz                      ( +-  1.32% )
   81,880,778,356 stalled-cycles-frontend   #   81.21% frontend cycles idle     ( +-  1.55% )
  <not supported> stalled-cycles-backend
   45,993,420,294 instructions              #    0.46  insns per cycle
                                            #    1.78  stalled cycles per insn  ( +-  0.17% )
    8,913,176,489 branches                  #  244.681 M/sec                    ( +-  0.28% )
      148,952,459 branch-misses             #    1.67% of all branches          ( +-  0.10% )

      6.651371241 seconds time elapsed                                          ( +-  0.80% )

  After:

     45366.665142 task-clock (msec)         #    8.833 CPUs utilized            ( +-  1.58% )
          164,449 context-switches          #    0.004 M/sec                    ( +-  0.38% )
            9,162 cpu-migrations            #    0.202 K/sec                    ( +- 16.43% )
        2,242,683 page-faults               #    0.049 M/sec                    ( +-  0.32% )
  125,819,591,445 cycles                    #    2.773 GHz                      ( +-  1.55% )
  108,179,202,984 stalled-cycles-frontend   #   85.98% frontend cycles idle     ( +-  1.74% )
  <not supported> stalled-cycles-backend
   44,224,632,232 instructions              #    0.35  insns per cycle
                                            #    2.45  stalled cycles per insn  ( +-  0.45% )
    8,526,568,335 branches                  #  187.948 M/sec                    ( +-  0.75% )
      139,700,745 branch-misses             #    1.64% of all branches          ( +-  0.11% )

      5.136301092 seconds time elapsed                                          ( +-  0.66% )

This algorithm is not as fancy as the probabilistic one, but this is
I think the best one among the three.

https://reviews.llvm.org/D27155

Files:
  ELF/OutputSections.cpp
  ELF/OutputSections.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D27155.79364.patch
Type: text/x-patch
Size: 5490 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161128/213f81b7/attachment.bin>