[PATCH] D31145: [Outliner] Fix compile-time overhead for candidate choice

Sean Silva via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 20 20:52:14 PDT 2017


silvas added a comment.

> Overlapping candidates are now handled entirely by the pruneOverlaps method that handled anything the suffix tree pruning didn't handle. Since that method is on average O(n), this behaves far better.

Why is it O(n) on average? It is two nested for loops with some bailouts that only affect the asymptotic runtime if candidates overlap with little-omega(1) other candidates. And it needs to be big-omega(sqrt(n)) if the runtime is to be reduced to even O(n^1.5).

Also, as a side note, pruneOverlaps seems to only consider the first occurrences. E.g. decreasing occurrence counts by 1. Two candidates may overlap in more than one place. This probably explains something I was seeing: if you print out the total benefit of each candidate as it is outlined, the total benefit that the outliner thinks it is getting from the outlining operations it does is more can be more than the total string length, which reflects incorrect cost modeling.

> Using the new candidate collection method, the worst compile-time regression was 0.542 seconds for MultiSource/Benchmarks/7zip/7zip-benchmark.test, followed up by 0.496 seconds for MultiSource/Benchmarks/Bullet/bullet.test. On average, there was a 0.01 second compile-time regression, with a median compile-time regression of 0.002s.

This is a big improvement over the previous numbers, but it would be good to phrase it as relative numbers. E.g. does the outliner after this patch ever take more than 10% of compile time? That would still be unfortunate.

Also, you should try this with FullLTO. That will smoke out compile time problems better. If you save off the combined bitcode files for each exe you care about, then you can just run llc on the bitcode directly which should let you iterate faster.



================
Comment at: lib/CodeGen/MachineOutliner.cpp:558
+  /// an internal node which appears in at least two suffixes. Each suffix is
+  /// represented by a leaf node. Thus, by iterating over the leaves and looking
+  /// at their parents, we can find every outlining candidate in the module
----------------

Why not just phrase this routine as a single tree walk visiting the internal nodes instead of doing this thing where you walk the leaf vector looking at parents?


================
Comment at: lib/CodeGen/MachineOutliner.cpp:580
   ///
-  /// \param N The suffix tree node to start pruning from.
-  /// \param Len The length of the string to be pruned.
+  /// FIXME: We only have to look at the leaves and their parents in the tree.
+  /// We should see if it's possible to save some space in the tree by taking
----------------
This FIXME doesn't make sense. Every internal node is the parent (not just ancestor) of a leaf. In fact you are relying on this so that looking at parents of the leaves considers all internal nodes.



================
Comment at: lib/CodeGen/MachineOutliner.cpp:620
+      bool IsTailCall;
+      unsigned Benefit = BenefitFn(*Leaf, StringLen, IsTailCall);
 
----------------
Should be `*Parent`, right? Leaves will always have 1 occurrence.


https://reviews.llvm.org/D31145





More information about the llvm-commits mailing list