[lld] r319503 - Make .gnu.hash section smaller.

Rui Ueyama via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 30 15:59:40 PST 2017


Author: ruiu
Date: Thu Nov 30 15:59:40 2017
New Revision: 319503

URL: http://llvm.org/viewvc/llvm-project?rev=319503&view=rev
Log:
Make .gnu.hash section smaller.

Our on-disk hash table was unnecessarily large. The cost of collision is
not high in the .gnu.hash table because each symbol in the .gnu.hash
table has a hash value with it. So, for each collided symbol, the
dynamic linker just compares an integer, which is pretty cheap.

This patch increases the load factor by about 8. Here's a comparison.

  $ readelf --histogram libclangSema.so.6.0.0svn-new-lld
  Histogram for `.gnu.hash' bucket list length (total of 582 buckets):
   Length  Number     % of total  Coverage
        0  11         (  1.9%)
        1  35         (  6.0%)      1.5%
        2  93         ( 16.0%)      9.5%
        3  108        ( 18.6%)     23.4%
        4  121        ( 20.8%)     44.1%
        5  86         ( 14.8%)     62.6%
        6  63         ( 10.8%)     78.8%
        7  38         (  6.5%)     90.2%
        8  18         (  3.1%)     96.4%
        9  6          (  1.0%)     98.7%
       10  3          (  0.5%)    100.0%

  $ readelf --histogram libclangSema.so.6.0.0svn-old-lld
  Histogram for `.gnu.hash' bucket list length (total of 4093 buckets):
   Length  Number     % of total  Coverage
        0  1498       ( 36.6%)
        1  1545       ( 37.7%)     37.7%
        2  712        ( 17.4%)     72.5%
        3  251        (  6.1%)     90.9%
        4  66         (  1.6%)     97.3%
        5  16         (  0.4%)     99.3%
        6  5          (  0.1%)    100.0%

  $ readelf --histogram libclangSema.so.6.0.0svn-bfd
  Histogram for `.gnu.hash' bucket list length (total of 1004 buckets):
   Length  Number     % of total  Coverage
      0  92         (  9.2%)
        1  227        ( 22.6%)      9.8%
        2  266        ( 26.5%)     32.6%
        3  222        ( 22.1%)     61.2%
        4  115        ( 11.5%)     81.0%
        5  55         (  5.5%)     92.8%
        6  21         (  2.1%)     98.2%
        7  6          (  0.6%)    100.0%

  $ readelf --histogram libclangSema.so.6.0.0svn-gold
  Histogram for `.gnu.hash' bucket list length (total of 2053 buckets):
   Length  Number     % of total  Coverage
        0  671        ( 32.7%)
        1  709        ( 34.5%)     30.4%
        2  470        ( 22.9%)     70.7%
        3  141        (  6.9%)     88.9%
        4  54         (  2.6%)     98.2%
        5  5          (  0.2%)     99.2%
        6  3          (  0.1%)    100.0%

Differential Revision: https://reviews.llvm.org/D40683

Modified:
    lld/trunk/ELF/SyntheticSections.cpp
    lld/trunk/test/ELF/gc-sections-shared.s

Modified: lld/trunk/ELF/SyntheticSections.cpp
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/SyntheticSections.cpp?rev=319503&r1=319502&r2=319503&view=diff
==============================================================================
--- lld/trunk/ELF/SyntheticSections.cpp (original)
+++ lld/trunk/ELF/SyntheticSections.cpp Thu Nov 30 15:59:40 2017
@@ -1775,22 +1775,6 @@ static uint32_t hashGnu(StringRef Name)
   return H;
 }
 
-// Returns a number of hash buckets to accomodate given number of elements.
-// We want to choose a moderate number that is not too small (which
-// causes too many hash collisions) and not too large (which wastes
-// disk space.)
-//
-// We return a prime number because it (is believed to) achieve good
-// hash distribution.
-static size_t getBucketSize(size_t NumSymbols) {
-  // List of largest prime numbers that are not greater than 2^n + 1.
-  for (size_t N : {131071, 65521, 32749, 16381, 8191, 4093, 2039, 1021, 509,
-                   251, 127, 61, 31, 13, 7, 3, 1})
-    if (N <= NumSymbols)
-      return N;
-  return 0;
-}
-
 // Add symbols to this symbol hash table. Note that this function
 // destructively sort a given vector -- which is needed because
 // GNU-style hash table places some sorting requirements.
@@ -1813,7 +1797,12 @@ void GnuHashTableSection::addSymbols(std
     Symbols.push_back({B, Ent.StrTabOffset, hashGnu(B->getName())});
   }
 
-  NBuckets = getBucketSize(Symbols.size());
+  // We chose load factor 4 for the on-disk hash table. For each hash
+  // collision, the dynamic linker will compare a uint32_t hash value.
+  // Since the integer comparison is quite fast, we believe we can make
+  // the load factor even larger. 4 is just a conservative choice.
+  NBuckets = std::max<size_t>(Symbols.size() / 4, 1);
+
   std::stable_sort(Symbols.begin(), Symbols.end(),
                    [&](const Entry &L, const Entry &R) {
                      return L.Hash % NBuckets < R.Hash % NBuckets;

Modified: lld/trunk/test/ELF/gc-sections-shared.s
URL: http://llvm.org/viewvc/llvm-project/lld/trunk/test/ELF/gc-sections-shared.s?rev=319503&r1=319502&r2=319503&view=diff
==============================================================================
--- lld/trunk/test/ELF/gc-sections-shared.s (original)
+++ lld/trunk/test/ELF/gc-sections-shared.s Thu Nov 30 15:59:40 2017
@@ -34,28 +34,28 @@
 # CHECK-NEXT:     Section: .text
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Symbol {
-# CHECK-NEXT:     Name: foo
+# CHECK-NEXT:     Name: baz
 # CHECK-NEXT:     Value:
 # CHECK-NEXT:     Size:
 # CHECK-NEXT:     Binding: Global
 # CHECK-NEXT:     Type:
 # CHECK-NEXT:     Other:
-# CHECK-NEXT:     Section: .text
+# CHECK-NEXT:     Section: Undefined
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Symbol {
-# CHECK-NEXT:     Name: qux
+# CHECK-NEXT:     Name: foo
 # CHECK-NEXT:     Value:
 # CHECK-NEXT:     Size:
-# CHECK-NEXT:     Binding: Weak
+# CHECK-NEXT:     Binding: Global
 # CHECK-NEXT:     Type:
 # CHECK-NEXT:     Other:
-# CHECK-NEXT:     Section: Undefined
+# CHECK-NEXT:     Section: .text
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Symbol {
-# CHECK-NEXT:     Name: baz
+# CHECK-NEXT:     Name: qux
 # CHECK-NEXT:     Value:
 # CHECK-NEXT:     Size:
-# CHECK-NEXT:     Binding: Global
+# CHECK-NEXT:     Binding: Weak
 # CHECK-NEXT:     Type:
 # CHECK-NEXT:     Other:
 # CHECK-NEXT:     Section: Undefined
@@ -90,31 +90,31 @@
 # CHECK2-NEXT:     Section: .text
 # CHECK2-NEXT:   }
 # CHECK2-NEXT:   Symbol {
-# CHECK2-NEXT:     Name: qux
+# CHECK2-NEXT:     Name: baz
 # CHECK2-NEXT:     Value:
 # CHECK2-NEXT:     Size:
-# CHECK2-NEXT:     Binding: Weak
+# CHECK2-NEXT:     Binding: Global
 # CHECK2-NEXT:     Type:
 # CHECK2-NEXT:     Other:
 # CHECK2-NEXT:     Section: Undefined
 # CHECK2-NEXT:   }
 # CHECK2-NEXT:   Symbol {
-# CHECK2-NEXT:     Name: foo
+# CHECK2-NEXT:     Name: qux
 # CHECK2-NEXT:     Value:
 # CHECK2-NEXT:     Size:
-# CHECK2-NEXT:     Binding: Global
+# CHECK2-NEXT:     Binding: Weak
 # CHECK2-NEXT:     Type:
 # CHECK2-NEXT:     Other:
-# CHECK2-NEXT:     Section: .text
+# CHECK2-NEXT:     Section: Undefined
 # CHECK2-NEXT:   }
 # CHECK2-NEXT:   Symbol {
-# CHECK2-NEXT:     Name: baz
+# CHECK2-NEXT:     Name: foo
 # CHECK2-NEXT:     Value:
 # CHECK2-NEXT:     Size:
 # CHECK2-NEXT:     Binding: Global
 # CHECK2-NEXT:     Type:
 # CHECK2-NEXT:     Other:
-# CHECK2-NEXT:     Section: Undefined
+# CHECK2-NEXT:     Section: .text
 # CHECK2-NEXT:   }
 # CHECK2-NEXT: ]
 




More information about the llvm-commits mailing list