[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Nick Lewycky nlewycky at google.com
Fri Jan 31 16:41:38 PST 2014

On 31 January 2014 16:15, Sean Silva <chisophugis at gmail.com> wrote:

> On Thu, Jan 30, 2014 at 8:09 PM, Nick Lewycky <nlewycky at google.com> wrote:
>> On 30 January 2014 01:27, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:
>>> Hello Sean and Tobias,
>>> Sean,
>>> Thank you. Could you describe Nick's ideas in few words or give me links
>>> to your discussion, so I could adapt my ideas to it.
>> Sorry I haven't had time to write it up. The essential idea is that we
>> use partition-refinement to determine the differences between a group of
>> functions instead of using pair-wise comparison.
> I can't remember exactly, but you also had one neat idea about using call
> graph information to narrow things down. Was it to use SCC size as an
> attribute in the partition refinement?

Yes, you can use the call graph as a factor in determining whether two
functions are the same, cheaply. The obvious case is that if one function
places a call and the other does not, they can't be equivalent. The reverse
post-order traversal performed by the CallGraphSCCPassManager means that
all callees are visited before callers. One of the things mergefunc needs
to consider is that two functions A and B which call two other functions (C
and D) might be different, or might be identical only-if C and D turn out
to be identical. This means we need to defer merging. With the call graph,
if we merge all callees before visiting callers, we don't need to defer
anything. If the callees are different between two functions, we've already
tried and failed to merge the callees ...

... except for the case where the caller and callee are in the same
strongly connected component, in which case we do the same
deferred-decision trick as we have in current MergeFunc.

I implemented all this, but never committed it to llvm because we didn't
have a CallGraph kicking around at the time MergeFunc ran, and computing
the call graph ultimately made it slower.

One other thing before I forget, it's important to merge mergefunc with
constantmerge. Currently we have a situation where we have constant tables
of function pointers (aka. virtual tables) and we have functions which load
these tables and grab functions out of them. It may turn out that all the
virtual functions across two template instantiations are the same, and we
want to merge the vtables and the functions. We can't do that if we don't
try to merge both constants and functions together.


>  Imagine you want to compare long sequences of numbers. You can grab a
>> pair of sequences and run down them until they differ or you reach the end,
>> and move on to another pair of sequences. With partition refinement you
>> start with the first number in all the sequences and bucket them. Say,
>> three 1's, two 2's, and an 8. You've just formed 3 buckets. For each bucket
>> (with more than 1 member), do it again with the next number in the
>> sequence. This gets you an equivalence comparison at less than the
>> complexity of pairwise comparisons. (The buckets are partitions, and you're
>> refining them as you read more data. Partition refinement is the dual of
>> union-find, which we use quite heavily in the compiler.)
>> I haven't figured out how this stacks up against Stepan's sorting with
>> <=> comparison. I can probably do that, I just haven't spent the time
>> thinking about it.
>> I also haven't thought through how to integrate it with "near miss"
>> comparison. My current leading idea is that you do the same thing, but make
>> a record of what makes two buckets similar. (Note that the wrong solution
>> is to store a per-function-pair list of similarities/differences.)
>> Nick
>> Tobias,
>>> Your patch fails on several modules in my benchmark (73 of ~1800 tests).
>>> I have sent one as attachment.
>>> See statistics files for more details, all the .ll files you could
>>> simply find in test-suite object directory (after "make TEST=nightly
>>> report").
>>> I have attached script that launches benchmark. Example of usage (you
>>> need Linux OS):
>>> bash test-scipt.sh report.txt opt-no-patch opt-with-patch
>>> Some numbers.
>>> Total number of functions: 52715
>>> Below, by default, I have excluded modules where your patch fails.
>>> Functions merged
>>> Original version: 1113
>>> Order relation patch: 1112
>>> Similar merging patch: 541
>>> Functions merged (including failed tests)
>>> Original version: 8398
>>> Order relation patch: 8395
>>> Summary files size
>>> Initial:                                 163595634 bytes.
>>> After merging with order relation patch: 163147731 bytes.
>>> After similar merging patch:             162491905 bytes.
>>> Summary files size (including failed tests)
>>> Initial:          250242266 bytes.
>>> Original version: 247168198 bytes.
>>> Order relation:   247175650 bytes.
>>> Time. I measured with "time" utility, and used "real (wall clock) time
>>> used by the process".
>>> Summary time spent
>>> With existing version:     28.05 secs.
>>> With order relation patch: 28.19 secs.
>>> Similar merging patch:     28.61 secs.
>>> Summary time spent (including failed tests)
>>> With existing version:     41.74 secs.
>>> With order relation patch: 36.35 secs.
>>> -Stepan
>>> Sean Silva wrote:
>>>> On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>>> <tobias.von.koch at gmail.com <mailto:tobias.von.koch at gmail.com>> wrote:
>>>>     Hi Stepan,
>>>>     Sorry for the delay. It's great that you are working on
>>>>     MergeFunctions as well and I agree, we should definitely try to
>>>>     combine our efforts to improve MergeFunctions.
>>>>     Just to give you some context, the pass (with the similar function
>>>>     merging patch) is already being used in a production setting. From
>>>>     my point of view, it would be better if we focus on improving its
>>>>     capability of merging functions at this stage rather than on
>>>>     reduction of compilation time.
>>>>     I'll give a brief overview of how our similar function merging
>>>>     algorithm works (you can also watch the presentation from the US
>>>>     LLVM conference to hear me explain it using an animation).
>>>>     1. Hash all functions into buckets
>>>>     In each bucket, separately:
>>>>       2. Compare functions pair-wise and determine a
>>>>          similarity metric for each pair (%age of equivalent
>>>> instructions)
>>>>       3. Merge identical functions (with similarity = 100%), update call
>>>>          sites for those functions.
>>>>       4. If the updates of call sites have touched other functions,
>>>>          go back to step 2 and re-compare *only* those functions to all
>>>>          others in their buckets.
>>>>       Finally,
>>>>       5. Form groups of similar functions for merging:
>>>>          a) Pick most similar pair (A,B)
>>>>          b) Find all functions C that are also similar to B but are not
>>>>             more similar to some other function D.
>>>>          c) Merge A with B and all the C's.
>>>>          Repeat until there are no more functions to merge.
>>>>     As you can see, we need to compute a similarity measure for each
>>>>     pair of functions in a bucket. If I understand correctly, your patch
>>>>     reduces compile-time by avoiding comparisons of functions that
>>>>     cannot be identical. That's exactly what you want to do if you only
>>>>     want to merge identical functions. However, because we also need to
>>>>     determine whether functions are merely *similar*, I can't see how
>>>>     your idea could be applied in that case.
>>>> Yeah, the existing pass only tries to merge identical functions
>>>> (identical in machine code). I'm wondering if the "similar" function
>>>> merging would be better as a separate pass (I'm genuinely wondering; I
>>>> have no clue). I would wait for Nick's feedback.
>>>> -- Sean Silva
>>>>     Looking at your statistics, I see that you are considering a very
>>>>     large number of functions. I don't know which benchmarks you are
>>>>     using, but I doubt that many of these are worth merging. For
>>>>     instance, you rarely gain a code size benefit from merging functions
>>>>     with just a few instructions. Our patch takes care of this using
>>>>     thresholds. It's worth looking at actual code size reductions,
>>>>     rather than just numbers of functions merged/ compilation time spent
>>>>     comparing functions. It turns out that the number of functions in
>>>>     each bucket is really quite small once you apply some heuristics
>>>>     (and could still be further reduced!).
>>>>     Given your experience with MergeFunctions, it would be really great
>>>>     if you could review our patch and also try it out on your
>>>> benchmarks.
>>>>     Tobias
>>>>     On 24/01/2014 19:11, Stepan Dyatkovskiy wrote:
>>>>         Hi Tobias.
>>>>         So, what do you think?
>>>>         If it means to much extra-job for your team, may be I can help
>>>> you
>>>>         somehow? I really would like to.
>>>>         -Stepan
>>>>         Stepan Dyatkovskiy wrote:
>>>>             Hi Tobias,
>>>>                 I can't really see a way to combine our approach with
>>>>                 your patch. What
>>>>                 are your thoughts?
>>>>             I think it is possible. Even more - we need to combine our
>>>>             efforts, in
>>>>             order to bring this pass into real live.
>>>>             I'have read your presentation file, and unfortunately read
>>>>             your patch
>>>>             only a little.
>>>>             How exactly you scan functions on 2nd stage? Could you
>>>>             explain the
>>>>             algorithm in few words, how you compare workflow? Is it
>>>>             possible to
>>>>             scan binary tree instead of hash table? ...OK.
>>>>             That's how I see the modification. Now its important to
>>>>             explain idea,
>>>>             so consider the simplest case: you need to catch two
>>>>             functions that
>>>>             are differs with single instruction somewhere.
>>>>             1. Imagine, that IR *module* contents is represented as
>>>>             binary tree:
>>>>             Each line (trace) from root to terminal node is a function.
>>>>             Each node - is function's primitive (instruction opcode, for
>>>>             example).
>>>>             Down - is direction to terminal nodes, up - is direction to
>>>>             the root.
>>>>             2. Now you are standing on some node. And you have two
>>>>             directions
>>>>             down-left and down-right.
>>>>             3. If you are able to find two equal sub-traces down (at
>>>>             left and at
>>>>             right), then the only difference lies in this node. Then we
>>>>             catch that
>>>>             case.
>>>>             4. Even more, if two subtrees at left and at right are
>>>>             equal, than you
>>>>             catch at once all the cases that are represented by these
>>>>             subtrees.
>>>>             I still didn't look at you patch carefully. Sorry.. But I
>>>>             hope that
>>>>             helps, and I'll try look at it in nearest time and perhaps
>>>>             its not the
>>>>             best solution I gave in this post.
>>>>             -Stepan
>>>>             22.01.2014, 20:53, "Tobias von Koch"
>>>>             <tobias.von.koch at gmail.com <mailto:tobias.von.koch at gmail.
>>>> com>>:
>>>>                 Hi Stepan,
>>>>                 As you've seen we have recently implemented a
>>>>                 significant enhancement to
>>>>                 the MergeFunctions pass that also allows merging of
>>>>                 functions that are
>>>>                 only similar but not identical
>>>>                 (http://llvm-reviews.__chandlerc.com/D2591
>>>>                 <http://llvm-reviews.chandlerc.com/D2591>).
>>>>                 Our patch also changes the way that functions are
>>>>                 compared quite
>>>>                 significantly. This is necessary because we need to
>>>>                 compare all
>>>>                 functions in a bucket in order to get a similarity
>>>>                 measure for each
>>>>                 pair, so we can then decide which 'groups' of functions
>>>>                 to merge.
>>>>                 I can't really see a way to combine our approach with
>>>>                 your patch. What
>>>>                 are your thoughts?
>>>>                 Another way to improve the performance of MergeFunctions
>>>>                 might be to
>>>>                 make the hash function better. I've already added the
>>>>                 size of the first
>>>>                 BB to it, and perhaps there are other factors we could
>>>>                 try... if we
>>>>                 don't have to compare functions in the first place
>>>>                 (because they're in
>>>>                 different buckets) then that's obviously a much bigger
>>>> win.
>>>>                 Thanks,
>>>>                 Tobias
>>>>                 On 17/01/2014 20:25, Stepan Dyatkovskiy wrote:
>>>>                        Hi all,
>>>>                        I propose simple improvement for MergeFunctions
>>>>                     pass, that reduced
>>>>                     its
>>>>                        complexity from O(N^2) to O(log(N)), where N is
>>>>                     number of
>>>>                     functions in
>>>>                        module.
>>>>                        The idea, is to replace the result of comparison
>>>>                     from "bool" to
>>>>                        "-1,0,1". In another words: define order relation
>>>>                     on functions set.
>>>>                        To be sure, that functions could be comparable
>>>>                     that way, we have to
>>>>                        prove that order relation is possible on all
>>>>                     comparison stage.
>>>>                        The last one is possible, if for each comparison
>>>>                     stage we
>>>>                     implement has
>>>>                        next properties:
>>>>                        * reflexivity (a <= a, a == a, a >= a),
>>>>                        * antisymmetry (if a <= b and b <= a then a  ==
>>>> b),
>>>>                        * transitivity (a <= b and b <= c, then a <= c)
>>>>                        * asymmetry (if a < b, then a > b or a == b).
>>>>                        Once we have defined order relation we can store
>>>>                     all the functions in
>>>>                        binary tree and perform lookup in O(log(N)) time.
>>>>                        This post has two attachments:
>>>>                        1. The patch, that has implementation of this
>>>> idea.
>>>>                        2. The MergeFunctions pass detailed description,
>>>>                     with explanation how
>>>>                        order relation could be possible.
>>>>                        Hope it helps to make things better!
>>>>                        -Stepan.
>>>>                        ______________________________
>>>> ___________________
>>>>                        LLVM Developers mailing list
>>>>                     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>                     http://llvm.cs.uiuc.edu
>>>>                     http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>>                     <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>>>     _________________________________________________
>>>>     LLVM Developers mailing list
>>>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>> http://llvm.cs.uiuc.edu
>>>>     http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>>     <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140131/0025ad8f/attachment.html>

More information about the llvm-dev mailing list