[PATCH] D56986: COFF, ELF: ICF: Perform 2 rounds of relocation hash propagation.

Sun Jan 20 14:31:47 PST 2019

pcc created this revision.
pcc added reviewers: ruiu, rnk.
Herald added subscribers: arichardson, emaste.
Herald added a reviewer: espindola.

LLD's performance on PGO instrumented Windows binaries was still not
great even with the fix in D56955 <https://reviews.llvm.org/D56955>; out of the 2m41s linker runtime,
around 2 minutes were still being spent in ICF. I looked into this more
closely and discovered that the vast majority of the runtime was being
spent segregating .pdata sections with the following relocation chain:

.pdata -> identical .text -> unique PGO counter (not eligible for ICF)

This patch causes us to perform 2 rounds of relocation hash
propagation, which allows the hash for the .pdata sections to
incorporate the identifier from the PGO counter. With that, the amount
of time spent in ICF was reduced to about 2 seconds. I also found that
the same change led to a significant ICF performance improvement in a
regular release build of Chromium's chrome_child.dll, where ICF time
was reduced from around 1s to around 700ms.

With the same change applied to the ELF linker, median of 100 runs
for lld-speed-test/chrome reduced from 4.53s to 4.45s on my machine.

I also experimented with increasing the number of propagation rounds
further, but I did not observe any further significant performance
improvements linking Chromium or Firefox.


Repository:
  rL LLVM

https://reviews.llvm.org/D56986

Files:
  lld/COFF/ICF.cpp
  lld/ELF/ICF.cpp


Index: lld/ELF/ICF.cpp
===================================================================

--- lld/ELF/ICF.cpp
+++ lld/ELF/ICF.cpp
@@ -426,8 +426,9 @@
 // Combine the hashes of the sections referenced by the given section into its
 // hash.
 template <class ELFT, class RelTy>
-static void combineRelocHashes(InputSection *IS, ArrayRef<RelTy> Rels) {
-  uint32_t Hash = IS->Class[1];
+static void combineRelocHashes(unsigned Cnt, InputSection *IS,
+                               ArrayRef<RelTy> Rels) {
+  uint32_t Hash = IS->Class[Cnt % 2];
   for (RelTy Rel : Rels) {
     Symbol &S = IS->template getFile<ELFT>()->getRelocTargetSym(Rel);
     if (auto *D = dyn_cast<Defined>(&S)) {
@@ -435,12 +436,12 @@
         // Rotate the hash by 1 to prevent it from being cancelled out by a
         // self-relocation.
         Hash = (Hash << 1) | (Hash >> 31);
-        Hash ^= RelSec->Class[1];
+        Hash ^= RelSec->Class[Cnt % 2];
       }
     }
   }
   // Set MSB to 1 to avoid collisions with non-hash IDs.
-  IS->Class[0] = Hash | (1U << 31);
+  IS->Class[(Cnt + 1) % 2] = Hash | (1U << 31);
 }
 
 static void print(const Twine &S) {
@@ -458,15 +459,17 @@
 
   // Initially, we use hash values to partition sections.
   parallelForEach(Sections, [&](InputSection *S) {
-    S->Class[1] = xxHash64(S->data());
+    S->Class[0] = xxHash64(S->data());
   });
 
-  parallelForEach(Sections, [&](InputSection *S) {
-    if (S->AreRelocsRela)
-      combineRelocHashes<ELFT>(S, S->template relas<ELFT>());
-    else
-      combineRelocHashes<ELFT>(S, S->template rels<ELFT>());
-  });
+  for (unsigned Cnt = 0; Cnt != 2; ++Cnt) {
+    parallelForEach(Sections, [&](InputSection *S) {
+      if (S->AreRelocsRela)
+        combineRelocHashes<ELFT>(Cnt, S, S->template relas<ELFT>());
+      else
+        combineRelocHashes<ELFT>(Cnt, S, S->template rels<ELFT>());
+    });
+  }
 
   // From now on, sections in Sections vector are ordered so that sections
   // in the same equivalence class are consecutive in the vector.
Index: lld/COFF/ICF.cpp
===================================================================
--- lld/COFF/ICF.cpp
+++ lld/COFF/ICF.cpp
@@ -263,24 +263,26 @@
 
   // Initially, we use hash values to partition sections.
   parallelForEach(Chunks, [&](SectionChunk *SC) {
-    SC->Class[1] = xxHash64(SC->getContents());
+    SC->Class[0] = xxHash64(SC->getContents());
   });
 
   // Combine the hashes of the sections referenced by each section into its
   // hash.
-  parallelForEach(Chunks, [&](SectionChunk *SC) {
-    uint32_t Hash = SC->Class[1];
-    for (Symbol *B : SC->symbols()) {
-      if (auto *Sym = dyn_cast_or_null<DefinedRegular>(B)) {
-        // Rotate the hash by 1 to prevent it from being cancelled out by a
-        // self-relocation.
-        Hash = (Hash << 1) | (Hash >> 31);
-        Hash ^= Sym->getChunk()->Class[1];
+  for (unsigned Cnt = 0; Cnt != 2; ++Cnt) {
+    parallelForEach(Chunks, [&](SectionChunk *SC) {
+      uint32_t Hash = SC->Class[Cnt % 2];
+      for (Symbol *B : SC->symbols()) {
+        if (auto *Sym = dyn_cast_or_null<DefinedRegular>(B)) {
+          // Rotate the hash by 1 to prevent it from being cancelled out by a
+          // self-relocation.
+          Hash = (Hash << 1) | (Hash >> 31);
+          Hash ^= Sym->getChunk()->Class[Cnt % 2];
+        }
       }
-    }
-    // Set MSB to 1 to avoid collisions with non-hash classs.
-    SC->Class[0] = Hash | (1U << 31);
-  });
+      // Set MSB to 1 to avoid collisions with non-hash classs.
+      SC->Class[(Cnt + 1) % 2] = Hash | (1U << 31);
+    });
+  }
 
   // From now on, sections in Chunks are ordered so that sections in
   // the same group are consecutive in the vector.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D56986.182716.patch
Type: text/x-patch
Size: 3715 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190120/1515325a/attachment.bin>