<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 1, 2016 at 11:45 AM, Rui Ueyama via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Author: ruiu<br>

Date: Thu Dec  1 13:45:22 2016<br>

New Revision: 288409<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=288409&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=288409&view=rev</a><br>

Log:<br>

Updates file comments and variable names.<br>

<br>

Use "color" instead of "group id" to describe the ICF algorithm.<br></blockquote><div><br></div><div>The right term is "congruence class"; I think you should use it. This ICF algorithm is basically a simple "optimistic" GVN/CSE algorithm; all values are initially assumed to be in the same congruence class and then that equivalence class is iteratively split as contradictions are found until there are no contradictions.</div><div><br></div><div>For example, look at llvm/lib/Transforms/Scalar/EarlyCSE.cpp and llvm/lib/Transforms/Scalar/GVN.cpp and <a href="https://reviews.llvm.org/D26224">https://reviews.llvm.org/D26224</a> (NewGVN) for similar algorithms, although (I haven't looked super closely at them, but I doubt either is fully optimistic like ICF) and they are much more complex because they have to deal with issues like control flow; ICF has no analogous issue. So this ICF algorithm is actually one of the simplest possible GVN/CSE algorithms.</div><div><br></div><div>(for example, look at all the `equals` methods in <a href="https://reviews.llvm.org/D26224">https://reviews.llvm.org/D26224</a>; the core loop is in NewGVN::runGVN)</div><div><br></div><div>-- Sean Silva</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Modified:<br>

    lld/trunk/ELF/ICF.cpp<br>

    lld/trunk/ELF/InputSection.h<br>

<br>

Modified: lld/trunk/ELF/ICF.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/ICF.cpp?rev=288409&r1=288408&r2=288409&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/lld/trunk/ELF/ICF.cpp?<wbr>rev=288409&r1=288408&r2=<wbr>288409&view=diff</a><br>

==============================<wbr>==============================<wbr>==================<br>

--- lld/trunk/ELF/ICF.cpp (original)<br>

+++ lld/trunk/ELF/ICF.cpp Thu Dec  1 13:45:22 2016<br>

@@ -7,51 +7,62 @@<br>

 //<br>

 //===-------------------------<wbr>------------------------------<wbr>---------------===//<br>

 //<br>

-// Identical Code Folding is a feature to merge sections not by name (which<br>

-// is regular comdat handling) but by contents. If two non-writable sections<br>

-// have the same data, relocations, attributes, etc., then the two<br>

-// are considered identical and merged by the linker. This optimization<br>

-// makes outputs smaller.<br>

-//<br>

-// ICF is theoretically a problem of reducing graphs by merging as many<br>

-// identical subgraphs as possible if we consider sections as vertices and<br>

-// relocations as edges. It may sound simple, but it is a bit more<br>

-// complicated than you might think. The order of processing sections<br>

-// matters because merging two sections can make other sections, whose<br>

-// relocations now point to the same section, mergeable. Graphs may contain<br>

-// cycles. We need a sophisticated algorithm to do this properly and<br>

-// efficiently.<br>

-//<br>

-// What we do in this file is this. We split sections into groups. Sections<br>

-// in the same group are considered identical.<br>

-//<br>

-// We begin by optimistically putting all sections into a single equivalence<br>

-// class. Then we apply a series of checks that split this initial<br>

-// equivalence class into more and more refined equivalence classes based on<br>

-// the properties by which a section can be distinguished.<br>

-//<br>

-// We begin by checking that the section contents and flags are the<br>

-// same. This only needs to be done once since these properties don't depend<br>

-// on the current equivalence class assignment.<br>

-//<br>

-// Then we split the equivalence classes based on checking that their<br>

-// relocations are the same, where relocation targets are compared by their<br>

-// equivalence class, not the concrete section. This may need to be done<br>

-// multiple times because as the equivalence classes are refined, two<br>

-// sections that had a relocation target in the same equivalence class may<br>

-// now target different equivalence classes, and hence these two sections<br>

-// must be put in different equivalence classes (whereas in the previous<br>

-// iteration they were not since the relocation target was the same.)<br>

-//<br>

-// Our algorithm is smart enough to merge the following mutually-recursive<br>

-// functions.<br>

+// ICF is short for Identical Code Folding. That is a size optimization to<br>

+// identify and merge two or more read-only sections (typically functions)<br>

+// that happened to have the same contents. It usually reduces output size<br>

+// by a few percent.<br>

+//<br>

+// In ICF, two sections are considered identical if they have the same<br>

+// section flags, section data, and relocations. Relocations are tricky,<br>

+// because two relocations are considered the same if they have the same<br>

+// relocation types, values, and if they point to the same sections *in<br>

+// terms of ICF*.<br>

+//<br>

+// Here is an example. If foo and bar defined below are compiled to the<br>

+// same machine instructions, ICF can and should merge the two, although<br>

+// their relocations point to each other.<br>

 //<br>

 //   void foo() { bar(); }<br>

 //   void bar() { foo(); }<br>

 //<br>

-// This algorithm is so-called "optimistic" algorithm described in<br>

-// <a href="http://research.google.com/pubs/pub36912.html" rel="noreferrer" target="_blank">http://research.google.com/<wbr>pubs/pub36912.html</a>. (Note that what GNU<br>

-// gold implemented is different from the optimistic algorithm.)<br>

+// If you merge the two, their relocations point to the same section and<br>

+// thus you know they are mergeable, but how do we know they are mergeable<br>

+// in the first place? This is not an easy problem to solve.<br>

+//<br>

+// What we are doing in LLD is some sort of coloring algorithm.<br>

+//<br>

+// We color non-identical sections in different colors repeatedly.<br>

+// Sections in the same color when the algorithm terminates are considered<br>

+// identical. Here are the details:<br>

+//<br>

+// 1. First, we color all sections using their hash values of section<br>

+//    types, section contents, and numbers of relocations. At this moment,<br>

+//    relocation targets are not taken into account. We just color<br>

+//    sections that apparently differ in different colors.<br>

+//<br>

+// 2. Next, for each color C, we visit sections in color C to compare<br>

+//    relocation target colors.  We recolor sections A and B in different<br>

+//    colors if A's and B's relocations are different in terms of target<br>

+//    colors.<br>

+//<br>

+// 3. If we recolor some section in step 2, relocations that were<br>

+//    previously pointing to the same color targets may now be pointing to<br>

+//    different colors. Therefore, repeat 2 until a convergence is<br>

+//    obtained.<br>

+//<br>

+// 4. For each color C, pick an arbitrary section in color C, and merges<br>

+//    other sections in color C with it.<br>

+//<br>

+// For small programs, this algorithm needs 3-5 iterations. For large<br>

+// programs such as Chromium, it takes more than 20 iterations.<br>

+//<br>

+// We parallelize each step so that multiple threads can work on different<br>

+// colors concurrently. That gave us a large performance boost when<br>

+// applying ICF on large programs. For example, MSVC link.exe or GNU gold<br>

+// takes 10-20 seconds to apply ICF on Chromium, whose output size is<br>

+// about 1.5 GB, but LLD can finish it in less than 2 seconds on a 2.8 GHz<br>

+// 40 core machine. Even without threading, LLD's ICF is still faster than<br>

+// MSVC or gold though.<br>

 //<br>

 //===-------------------------<wbr>------------------------------<wbr>---------------===//<br>

<br>

@@ -119,8 +130,7 @@ template <class ELFT> static bool isElig<br>

          S->Name != ".init" && S->Name != ".fini";<br>

 }<br>

<br>

-// Before calling this function, all sections in range R must have the<br>

-// same group ID.<br>

+// Split R into smaller ranges by recoloring its members.<br>

 template <class ELFT> void ICF<ELFT>::segregate(Range *R, bool Constant) {<br>

   // This loop rearranges sections in range R so that all sections<br>

   // that are equal in terms of equals{Constant,Variable} are contiguous<br>

@@ -158,24 +168,23 @@ template <class ELFT> void ICF<ELFT>::se<br>

     }<br>

     R->End = Mid;<br>

<br>

-    // Update GroupIds for the new group members.<br>

+    // Update the new group member colors.<br>

     //<br>

-    // Note on GroupId[0] and GroupId[1]: we have two storages for<br>

-    // group IDs. At the beginning of each iteration of the main loop,<br>

-    // both have the same ID. GroupId[0] contains the current ID, and<br>

-    // GroupId[1] contains the next ID which will be used in the next<br>

-    // iteration.<br>

+    // Note on Color[0] and Color[1]: we have two storages for colors.<br>

+    // At the beginning of each iteration of the main loop, both have<br>

+    // the same color. Color[0] contains the current color, and Color[1]<br>

+    // contains the next color which will be used in the next iteration.<br>

     //<br>

     // Recall that other threads may be working on other ranges. They<br>

-    // may be reading group IDs that we are about to update. We cannot<br>

-    // update group IDs in place because it breaks the invariance that<br>

-    // all sections in the same group must have the same ID. In other<br>

-    // words, the following for loop is not an atomic operation, and<br>

-    // that is observable from other threads.<br>

+    // may be reading colors that we are about to update. We cannot<br>

+    // update colors in place because it breaks the invariance that<br>

+    // all sections in the same group must have the same color. In<br>

+    // other words, the following for loop is not an atomic operation,<br>

+    // and that is observable from other threads.<br>

     //<br>

-    // By writing new IDs to write-only places, we can keep the invariance.<br>

+    // By writing new colors to write-only places, we can keep the invariance.<br>

     for (size_t I = Mid; I < End; ++I)<br>

-      Sections[I]->GroupId[(Cnt + 1) % 2] = Id;<br>

+      Sections[I]->Color[(Cnt + 1) % 2] = Id;<br>

<br>

     R = NewRange;<br>

   }<br>

@@ -216,13 +225,13 @@ template <class RelTy><br>

 bool ICF<ELFT>::variableEq(const InputSection<ELFT> *A, ArrayRef<RelTy> RelsA,<br>

                            const InputSection<ELFT> *B, ArrayRef<RelTy> RelsB) {<br>

   auto Eq = [&](const RelTy &RA, const RelTy &RB) {<br>

+    // The two sections must be identical.<br>

     SymbolBody &SA = A->getFile()-><wbr>getRelocTargetSym(RA);<br>

     SymbolBody &SB = B->getFile()-><wbr>getRelocTargetSym(RB);<br>

     if (&SA == &SB)<br>

       return true;<br>

<br>

-    // Or, the symbols should be pointing to the same section<br>

-    // in terms of the group ID.<br>

+    // Or, the two sections must have the same color.<br>

     auto *DA = dyn_cast<DefinedRegular<ELFT>><wbr>(&SA);<br>

     auto *DB = dyn_cast<DefinedRegular<ELFT>><wbr>(&SB);<br>

     if (!DA || !DB)<br>

@@ -234,16 +243,16 @@ bool ICF<ELFT>::variableEq(const InputSe<br>

     auto *Y = dyn_cast<InputSection<ELFT>>(<wbr>DB->Section);<br>

     if (!X || !Y)<br>

       return false;<br>

-    if (X->GroupId[Cnt % 2] == 0)<br>

+    if (X->Color[Cnt % 2] == 0)<br>

       return false;<br>

<br>

     // Performance hack for single-thread. If no other threads are<br>

-    // running, we can safely read next GroupIDs as there is no race<br>

+    // running, we can safely read next colors as there is no race<br>

     // condition. This optimization may reduce the number of<br>

     // iterations of the main loop because we can see results of the<br>

     // same iteration.<br>

     size_t Idx = (Config->Threads ? Cnt : Cnt + 1) % 2;<br>

-    return X->GroupId[Idx] == Y->GroupId[Idx];<br>

+    return X->Color[Idx] == Y->Color[Idx];<br>

   };<br>

<br>

   return std::equal(RelsA.begin(), RelsA.end(), RelsB.begin(), Eq);<br>

@@ -274,45 +283,45 @@ template <class ELFT> void ICF<ELFT>::ru<br>

       if (isEligible(S))<br>

         Sections.push_back(S);<br>

<br>

-  // Initially, we use hash values as section group IDs. Therefore,<br>

-  // if two sections have the same ID, they are likely (but not<br>

+  // Initially, we use hash values to color sections. Therefore, if<br>

+  // two sections have the same color, they are likely (but not<br>

   // guaranteed) to have the same static contents in terms of ICF.<br>

   for (InputSection<ELFT> *S : Sections)<br>

-    // Set MSB to 1 to avoid collisions with non-hash IDs.<br>

-    S->GroupId[0] = S->GroupId[1] = getHash(S) | (1 << 31);<br>

+    // Set MSB to 1 to avoid collisions with non-hash colors.<br>

+    S->Color[0] = S->Color[1] = getHash(S) | (1 << 31);<br>

<br>

   // From now on, sections in Sections are ordered so that sections in<br>

-  // the same group are consecutive in the vector.<br>

+  // the same color are consecutive in the vector.<br>

   std::stable_sort(Sections.<wbr>begin(), Sections.end(),<br>

                    [](InputSection<ELFT> *A, InputSection<ELFT> *B) {<br>

-                     if (A->GroupId[0] != B->GroupId[0])<br>

-                       return A->GroupId[0] < B->GroupId[0];<br>

+                     if (A->Color[0] != B->Color[0])<br>

+                       return A->Color[0] < B->Color[0];<br>

                      // Within a group, put the highest alignment<br>

                      // requirement first, so that's the one we'll keep.<br>

                      return B->Alignment < A->Alignment;<br>

                    });<br>

<br>

-  // Split sections into groups by ID. And then we are going to<br>

-  // split groups into more and more smaller groups.<br>

-  // Note that we do not add single element groups because they<br>

-  // are already the smallest.<br>

+  // Create ranges in which each range contains sections in the same<br>

+  // color. And then we are going to split ranges into more and more<br>

+  // smaller ranges. Note that we do not add single element ranges<br>

+  // because they are already the smallest.<br>

   Ranges.reserve(Sections.size()<wbr>);<br>

   for (size_t I = 0, E = Sections.size(); I < E - 1;) {<br>

     // Let J be the first index whose element has a different ID.<br>

     size_t J = I + 1;<br>

-    while (J < E && Sections[I]->GroupId[0] == Sections[J]->GroupId[0])<br>

+    while (J < E && Sections[I]->Color[0] == Sections[J]->Color[0])<br>

       ++J;<br>

     if (J - I > 1)<br>

       Ranges.push_back({I, J});<br>

     I = J;<br>

   }<br>

<br>

-  // This function copies new GroupIds from former write-only space to<br>

-  // former read-only space, so that we can flip GroupId[0] and GroupId[1].<br>

-  // Note that new GroupIds are always be added to end of Ranges.<br>

+  // This function copies colors from former write-only space to former<br>

+  // read-only space, so that we can flip Color[0] and Color[1]. Note<br>

+  // that new colors are always be added to end of Ranges.<br>

   auto Copy = [&](Range &R) {<br>

     for (size_t I = R.Begin; I < R.End; ++I)<br>

-      Sections[I]->GroupId[Cnt % 2] = Sections[I]->GroupId[(Cnt + 1) % 2];<br>

+      Sections[I]->Color[Cnt % 2] = Sections[I]->Color[(Cnt + 1) % 2];<br>

   };<br>

<br>

   // Compare static contents and assign unique IDs for each static content.<br>

@@ -321,7 +330,7 @@ template <class ELFT> void ICF<ELFT>::ru<br>

   foreach(End, Ranges.end(), Copy);<br>

   ++Cnt;<br>

<br>

-  // Split groups by comparing relocations until convergence is obtained.<br>

+  // Split ranges by comparing relocations until convergence is obtained.<br>

   for (;;) {<br>

     auto End = Ranges.end();<br>

     foreach(Ranges.begin(), End, [&](Range &R) { segregate(&R, false); });<br>

@@ -334,7 +343,7 @@ template <class ELFT> void ICF<ELFT>::ru<br>

<br>

   log("ICF needed " + Twine(Cnt) + " iterations");<br>

<br>

-  // Merge sections in the same group.<br>

+  // Merge sections in the same colors.<br>

   for (Range R : Ranges) {<br>

     if (R.End - R.Begin == 1)<br>

       continue;<br>

<br>

Modified: lld/trunk/ELF/InputSection.h<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/InputSection.h?rev=288409&r1=288408&r2=288409&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/lld/trunk/ELF/<wbr>InputSection.h?rev=288409&r1=<wbr>288408&r2=288409&view=diff</a><br>

==============================<wbr>==============================<wbr>==================<br>

--- lld/trunk/ELF/InputSection.h (original)<br>

+++ lld/trunk/ELF/InputSection.h Thu Dec  1 13:45:22 2016<br>

@@ -289,7 +289,7 @@ public:<br>

   void relocateNonAlloc(uint8_t *Buf, llvm::ArrayRef<RelTy> Rels);<br>

<br>

   // Used by ICF.<br>

-  uint32_t GroupId[2] = {0, 0};<br>

+  uint32_t Color[2] = {0, 0};<br>

<br>

   // Called by ICF to merge two input sections.<br>

   void replace(InputSection<ELFT> *Other);<br>

<br>

<br>

______________________________<wbr>_________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div></div>