[PATCH] D27247: Parallelize ICF to make LLD's ICF really fast.

Sean Silva via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 29 23:36:12 PST 2016


silvas added a comment.

Nice! The idea of storing current and next colors solves the nondeterminism in a very simple way!

One concern: this increases sizeof(InputSection) :(
With -ffunction-sections -fdata-sections we might have a very large number of them, so reducing memory footprint is important. I'm afraid that this might slow down regular links.
After this patch, we will be paying one pointer size for `DependentSection` and 2 for `GroupId`. That is 3 pointer memory overhead which is really quite nontrivial.
It doesn't have to be done in this patch, but I think we can adjust the memory allocation to optionally allocate this data "off the tail" of the InputSection, so that we don't pay the memory overhead if these advanced features aren't being used.

Is the large (23) number of iterations due to slowness of propagating identicalness across references (one level of references per iteration currently) or due to ICF<ELFT>::segregate only being able to split into two at each iteration? (or a combination of both?). Here are two ideas for reducing the number of iterations:

1. do some sort of topological sorting (even approximate) and then do partial iterations which only sort only part of the array. (more generally, avoid revisiting sections that are unlikely to change this iteration). This can speed up the convergence since we avoid wasting work on nodes that won't change.

One interesting observation is that if the array is topologically sorted (i.e. except for cycles) then I believe that a serial visitation with relaxation at each step (i.e., cannot be parallelized deterministically) would be guaranteed to resolve in a single iteration. The savings of reducing iterations might pay off.
Note that --gc-sections already has to compute some of this, so this topological ordering information might not be so expensive.

2. make the "equal" comparison actually be a "less". That will allow `ICF<ELFT>::segregate` to sort instead of partition, which allows it to generate multiple ranges at a time.



> What we are doing in LLD is some sort of coloring algorithm

Believe it or not, once I started learning about GVN I learned that this algorithm is actually a textbook example of an "optimistic" GVN algorithm. So it is actually a well-studied kind of algorithm.



================
Comment at: ELF/InputSection.h:292
   // Used by ICF.
-  uint64_t GroupId = 0;
+  uint64_t GroupId[2] = {0, 0};
 
----------------
Would

```
struct {
  uint64_t Current;
  uint64_t Next;
} GroupId;
```

be a bit better?


https://reviews.llvm.org/D27247





More information about the llvm-commits mailing list