[PATCH] D31145: [Outliner] Fix compile-time overhead for candidate choice

Fri Mar 24 17:27:06 PDT 2017

silvas added a comment.

In https://reviews.llvm.org/D31145#707827, @paquette wrote:

> In https://reviews.llvm.org/D31145#705994, @silvas wrote:
>
> > Why is it O(n) on average? It is two nested for loops with some bailouts that only affect the asymptotic runtime if candidates overlap with little-omega(1) other candidates. And it needs to be big-omega(sqrt(n)) if the runtime is to be reduced to even O(n^1.5).
>
>
> Compiling for AArch64, for the SingleSource and MultiSource tests in the test suite the average length of a candidate is 10 MachineInstrs. The maximum length of a candidate is 108, which appears in a test with 11251 MachineInstrs. So far, it appears to be linear time on average, because for programs with a lot of candidates, the max length of a candidate is usually dominated by the number of candidates. I should really prove the bound though. It's a little tricky because it's O(m * n) where m is the maximum length of a candidate.
>
> Hand waving/brain dump: The tricky case is where m = NumInstructions/k for k > 2 and n ~ O(NumInstructions). As k gets larger, m gets smaller. When m = 2, we only have to compare against one other candidate for each candidate, so we're good.
>
> Previously, since we did some overlap pruning in the tree, we knew that as k gets smaller, the possible number of candidates in the program got smaller. Now that I think about it, that's not really true anymore. I guess the point here would be to show that we do the same number of overlap checks as in the tree here. But then we aren't really talking about pruneOverlaps, so ¯\_(ツ)_/¯

Oh, I think I was getting really confused about Candidate representing a single occurrence of a string or the equivalence class for that string itself. E.g. there is a comment "an occurrence of this candidate", but "Candidate" is the "occurrence", and "OutlinedFunction" is the "candidate". Maybe call them "Occurrence" and "CandidateClass" or something?

Also, I think I was thinking that this was trying to emulate the previous serialized greedy-choice, prune/recalculate benefit, greedy-choice, prune/recalculate benefit, ... that it used to do (or maybe I was misunderstanding that too... why would pruneOverlaps have been needed in that case...).

Anyway, this seems fine, sorry for the confusion. I see that on the version you committed, you added the comment that it is ` O(MaxCandidateLen * CandidateList.size())`. It might be good to put some concrete numbers to give readers a feel. E.g. MaxCandidateLen in practice is basically a constant, and CandidateList.size() is typically some fraction of the number of instructions.

> In the meantime, I can make the outliner bail out entirely in the event that things spin out of control. From there, I could start working on maybe
> 
> - Putting stuff in an interval tree or...
> - Adding some stronger checks in the same vein as the max length check to reliably bound the search space now that we don't have the tree pruning

Now that I understand it better, I don't think this really needs any improvement. There seem to be only a couple ways of improving the code size benefit of the outliner:

1. Make more instructions viable for outlining (and you've been doing plenty of work on this, e.g. stack fixups and tail calls)
2. Make the outlining decision process get a more optimal result (this is a combinatorial optimization problem)
3. Heuristics for making outlining results across TU's converge more (I don't think there's much to do for this TBH)

I think the original hope for the suffix tree was to make 2 a lot better, but I think that on practical inputs even simple approaches like the finding parents of leaf children + pruneOverlaps already get pretty close to optimal (for example, it seems that just looking at parents of leaf children doesn't hurt the result too much).

Do you have any other plans here to exploit the suffix tree. I think that after this patch I think all we use the suffix tree for is to find repeated substrings. We might want to replace that with single sort + linear scan. I.e. initialize a vector of pointers into each instruction, then sort them by the suffix starting at that instruction.

You might want to do an experiment where you make "is instruction safe to outline" always return true and see how much more code size improvement is achieved. That will give a bound on how much you can expect to gain by improving 1., which seems useful to know.

> 
> 
>> Also, as a side note, pruneOverlaps seems to only consider the first occurrences. E.g. decreasing occurrence counts by 1. Two candidates may overlap in more than one place. This probably explains something I was seeing: if you print out the total benefit of each candidate as it is outlined, the total benefit that the outliner thinks it is getting from the outlining operations it does is more can be more than the total string length, which reflects incorrect cost modeling.
> 
> I don't quite understand what you mean here. There are multiple occurrences of one candidate that can overlap with other candidates in multiple places, yes, but the decrement is done on their respective OutlinedFunctions. Also, the benefit used after this function is stored in the OutlinedFunction. Candidate benefit is only used for the greedy choice to prevent candidates from "cancelling each other out". Is that what you're getting at? I didn't really explain this well in the code on a second glance.

I think I was really confused here. Sorry for the noise.

https://reviews.llvm.org/D31145