[PATCH] D31145: [Outliner] Fix compile-time overhead for candidate choice

Mon Mar 20 11:29:35 PDT 2017

paquette created this revision.
Herald added subscribers: mehdi_amini, rengolin, aemerson.

The candidate collection method in the outliner can cause some dramatic code size regressions on large tests.

Currently, it works like this:

1. Build a suffix tree over the instructions in the program
2. Query the tree for good candidates
3. Prune tree
4. Go to step 2 until no candidates are found.

This hinges off the assumption that pruning the tree based off of overlapping candidates will reduce the search space enough that subsequent queries will take a negligible amount of time. Unfortunately, this assumption is shaky at best. Certain programs don't actually have many overlapping candidates. For very large tests, this is a big problem.

The new candidate collection method works like this:

1. Build a suffix tree over the instructions in the program
2. Iterate over each leaf in the tree.
3. Visit the parent of each leaf. If the parent represents a beneficial string, save a candidate for it.

There are O(I) leaves, where I is the number of instructions in the program. Since we only walk it once, this is now guaranteed to be at worst O(I). Overlapping candidates are now handled entirely by the `pruneOverlaps` method that handled anything the suffix tree pruning didn't handle. Since that method is on average O(n), this behaves far better.

**Improvements**

Compile time was measured by compiling the LLVM test suite for AArch64 using LNT. Clang with Oz and mno-red-zone was tested against clang with Oz, mno-red-zone, and -mllvm -enable-machine-outliner.

Using the old candidate collection method, the worst compile time regression was a spectacular 284.07 seconds on MultiSource/Benchmarks/7zip/7zip-benchmark.test. This was followed up by MultiSource/Benchmarks/Bullet/bullet.test with a 257.96 second regression. On average, there was a 9.69 second compile time regression and a median compile time regression of 0.85 seconds.

Using the new candidate collection method, the worst compile time regression was 0.542 seconds for MultiSource/Benchmarks/7zip/7zip-benchmark.test, followed up by 0.496 seconds for MultiSource/Benchmarks/Bullet/bullet.test. On average, there was a 0.01 second compile time regression, with a median compile time regression of 0.002s.

**Changes to code size**

This change doesn't impact the code size results. In fact, by collecting all potential candidates, we can probably make the outliner make better decisions for what to outline in the future.

https://reviews.llvm.org/D31145

Files:
  lib/CodeGen/MachineOutliner.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D31145.92358.patch
Type: text/x-patch
Size: 29784 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170320/2b7e2d76/attachment.bin>