<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jan 30, 2014 at 8:09 PM, Nick Lewycky <span dir="ltr"><<a href="mailto:nlewycky@google.com" target="_blank">nlewycky@google.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="im">On 30 January 2014 01:27, Stepan Dyatkovskiy <span dir="ltr"><<a href="mailto:stpworld@narod.ru" target="_blank">stpworld@narod.ru</a>></span> wrote:<br>

</div><div class="gmail_extra"><div class="gmail_quote"><div class="im">


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello Sean and Tobias,<br>

<br>

Sean,<br>

Thank you. Could you describe Nick's ideas in few words or give me links to your discussion, so I could adapt my ideas to it.<br></blockquote><div><br></div></div><div>Sorry I haven't had time to write it up. The essential idea is that we use partition-refinement to determine the differences between a group of functions instead of using pair-wise comparison.</div>

</div></div></div></blockquote><div><br></div><div>I can't remember exactly, but you also had one neat idea about using call graph information to narrow things down. Was it to use SCC size as an attribute in the partition refinement?</div>

<div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">


<div><br></div><div>Imagine you want to compare long sequences of numbers. You can grab a pair of sequences and run down them until they differ or you reach the end, and move on to another pair of sequences. With partition refinement you start with the first number in all the sequences and bucket them. Say, three 1's, two 2's, and an 8. You've just formed 3 buckets. For each bucket (with more than 1 member), do it again with the next number in the sequence. This gets you an equivalence comparison at less than the complexity of pairwise comparisons. (The buckets are partitions, and you're refining them as you read more data. Partition refinement is the dual of union-find, which we use quite heavily in the compiler.)</div>


<div><br></div><div>I haven't figured out how this stacks up against Stepan's sorting with <=> comparison. I can probably do that, I just haven't spent the time thinking about it.</div><div><br></div><div>


I also haven't thought through how to integrate it with "near miss" comparison. My current leading idea is that you do the same thing, but make a record of what makes two buckets similar. (Note that the wrong solution is to store a per-function-pair list of similarities/differences.)</div>


<div><br></div><div>Nick</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

Tobias,<br>

Your patch fails on several modules in my benchmark (73 of ~1800 tests). I have sent one as attachment.<br>

<br>

See statistics files for more details, all the .ll files you could simply find in test-suite object directory (after "make TEST=nightly report").<br>

I have attached script that launches benchmark. Example of usage (you need Linux OS):<br>

bash test-scipt.sh report.txt opt-no-patch opt-with-patch<br>

<br>

Some numbers.<br>

<br>

Total number of functions: 52715<br>

<br>

Below, by default, I have excluded modules where your patch fails.<br>

<br>

Functions merged<br>

Original version: 1113<br>

Order relation patch: 1112<br>

Similar merging patch: 541<br>

<br>

Functions merged (including failed tests)<br>

Original version: 8398<br>

Order relation patch: 8395<br>

<br>

Summary files size<br>

Initial:                                 163595634 bytes.<br>

After merging with order relation patch: 163147731 bytes.<br>

After similar merging patch:             162491905 bytes.<br>

<br>

Summary files size (including failed tests)<br>

Initial:          250242266 bytes.<br>

Original version: 247168198 bytes.<br>

Order relation:   247175650 bytes.<br>

<br>

Time. I measured with "time" utility, and used "real (wall clock) time used by the process".<br>

<br>

Summary time spent<br>

With existing version:     28.05 secs.<br>

With order relation patch: 28.19 secs.<br>

Similar merging patch:     28.61 secs.<br>

<br>

Summary time spent (including failed tests)<br>

With existing version:     41.74 secs.<br>

With order relation patch: 36.35 secs.<br>

<br>

-Stepan<br>

<br>

Sean Silva wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>

<br>

<br>

<br>

On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch<br></div><div><div>

<<a href="mailto:tobias.von.koch@gmail.com" target="_blank">tobias.von.koch@gmail.com</a> <mailto:<a href="mailto:tobias.von.koch@gmail.com" target="_blank">tobias.von.koch@gmail.<u></u>com</a>>> wrote:<br>

<br>

    Hi Stepan,<br>

<br>

    Sorry for the delay. It's great that you are working on<br>

    MergeFunctions as well and I agree, we should definitely try to<br>

    combine our efforts to improve MergeFunctions.<br>

<br>

    Just to give you some context, the pass (with the similar function<br>

    merging patch) is already being used in a production setting. From<br>

    my point of view, it would be better if we focus on improving its<br>

    capability of merging functions at this stage rather than on<br>

    reduction of compilation time.<br>

<br>

    I'll give a brief overview of how our similar function merging<br>

    algorithm works (you can also watch the presentation from the US<br>

    LLVM conference to hear me explain it using an animation).<br>

<br>

    1. Hash all functions into buckets<br>

<br>

    In each bucket, separately:<br>

<br>

      2. Compare functions pair-wise and determine a<br>

         similarity metric for each pair (%age of equivalent instructions)<br>

<br>

      3. Merge identical functions (with similarity = 100%), update call<br>

         sites for those functions.<br>

<br>

      4. If the updates of call sites have touched other functions,<br>

         go back to step 2 and re-compare *only* those functions to all<br>

         others in their buckets.<br>

<br>

      Finally,<br>

<br>

      5. Form groups of similar functions for merging:<br>

         a) Pick most similar pair (A,B)<br>

         b) Find all functions C that are also similar to B but are not<br>

            more similar to some other function D.<br>

         c) Merge A with B and all the C's.<br>

         Repeat until there are no more functions to merge.<br>

<br>

    As you can see, we need to compute a similarity measure for each<br>

    pair of functions in a bucket. If I understand correctly, your patch<br>

    reduces compile-time by avoiding comparisons of functions that<br>

    cannot be identical. That's exactly what you want to do if you only<br>

    want to merge identical functions. However, because we also need to<br>

    determine whether functions are merely *similar*, I can't see how<br>

    your idea could be applied in that case.<br>

<br>

<br>

Yeah, the existing pass only tries to merge identical functions<br>

(identical in machine code). I'm wondering if the "similar" function<br>

merging would be better as a separate pass (I'm genuinely wondering; I<br>

have no clue). I would wait for Nick's feedback.<br>

<br>

-- Sean Silva<br>

<br>

<br>

    Looking at your statistics, I see that you are considering a very<br>

    large number of functions. I don't know which benchmarks you are<br>

    using, but I doubt that many of these are worth merging. For<br>

    instance, you rarely gain a code size benefit from merging functions<br>

    with just a few instructions. Our patch takes care of this using<br>

    thresholds. It's worth looking at actual code size reductions,<br>

    rather than just numbers of functions merged/ compilation time spent<br>

    comparing functions. It turns out that the number of functions in<br>

    each bucket is really quite small once you apply some heuristics<br>

    (and could still be further reduced!).<br>

<br>

    Given your experience with MergeFunctions, it would be really great<br>

    if you could review our patch and also try it out on your benchmarks.<br>

<br>

    Tobias<br>

<br>

<br>

    On 24/01/2014 19:11, Stepan Dyatkovskiy wrote:<br>

<br>

        Hi Tobias.<br>

<br>

        So, what do you think?<br>

<br>

        If it means to much extra-job for your team, may be I can help you<br>

        somehow? I really would like to.<br>

<br>

        -Stepan<br>

<br>

        Stepan Dyatkovskiy wrote:<br>

<br>

            Hi Tobias,<br>

<br>

                I can't really see a way to combine our approach with<br>

                your patch. What<br>

                are your thoughts?<br>

<br>

<br>

            I think it is possible. Even more - we need to combine our<br>

            efforts, in<br>

            order to bring this pass into real live.<br>

            I'have read your presentation file, and unfortunately read<br>

            your patch<br>

            only a little.<br>

            How exactly you scan functions on 2nd stage? Could you<br>

            explain the<br>

            algorithm in few words, how you compare workflow? Is it<br>

            possible to<br>

            scan binary tree instead of hash table? ...OK.<br>

<br>

            That's how I see the modification. Now its important to<br>

            explain idea,<br>

            so consider the simplest case: you need to catch two<br>

            functions that<br>

            are differs with single instruction somewhere.<br>

<br>

            1. Imagine, that IR *module* contents is represented as<br>

            binary tree:<br>

            Each line (trace) from root to terminal node is a function.<br>

            Each node - is function's primitive (instruction opcode, for<br>

            example).<br>

            Down - is direction to terminal nodes, up - is direction to<br>

            the root.<br>

<br>

            2. Now you are standing on some node. And you have two<br>

            directions<br>

            down-left and down-right.<br>

<br>

            3. If you are able to find two equal sub-traces down (at<br>

            left and at<br>

            right), then the only difference lies in this node. Then we<br>

            catch that<br>

            case.<br>

<br>

            4. Even more, if two subtrees at left and at right are<br>

            equal, than you<br>

            catch at once all the cases that are represented by these<br>

            subtrees.<br>

<br>

            I still didn't look at you patch carefully. Sorry.. But I<br>

            hope that<br>

            helps, and I'll try look at it in nearest time and perhaps<br>

            its not the<br>

            best solution I gave in this post.<br>

<br>

            -Stepan<br>

<br>

            22.01.2014, 20:53, "Tobias von Koch"<br></div></div>

            <<a href="mailto:tobias.von.koch@gmail.com" target="_blank">tobias.von.koch@gmail.com</a> <mailto:<a href="mailto:tobias.von.koch@gmail.com" target="_blank">tobias.von.koch@gmail.<u></u>com</a>>>:<div>


<br>

<br>

                Hi Stepan,<br>

<br>

                As you've seen we have recently implemented a<br>

                significant enhancement to<br>

                the MergeFunctions pass that also allows merging of<br>

                functions that are<br>

                only similar but not identical<br></div>

                (<a href="http://llvm-reviews." target="_blank">http://llvm-reviews.</a>__<a href="http://chandlerc.com/D2591" target="_blank">chandle<u></u>rc.com/D2591</a><br>

                <<a href="http://llvm-reviews.chandlerc.com/D2591" target="_blank">http://llvm-reviews.<u></u>chandlerc.com/D2591</a>>).<div><div><br>

<br>

                Our patch also changes the way that functions are<br>

                compared quite<br>

                significantly. This is necessary because we need to<br>

                compare all<br>

                functions in a bucket in order to get a similarity<br>

                measure for each<br>

                pair, so we can then decide which 'groups' of functions<br>

                to merge.<br>

<br>

                I can't really see a way to combine our approach with<br>

                your patch. What<br>

                are your thoughts?<br>

<br>

                Another way to improve the performance of MergeFunctions<br>

                might be to<br>

                make the hash function better. I've already added the<br>

                size of the first<br>

                BB to it, and perhaps there are other factors we could<br>

                try... if we<br>

                don't have to compare functions in the first place<br>

                (because they're in<br>

                different buckets) then that's obviously a much bigger win.<br>

<br>

                Thanks,<br>

                Tobias<br>

<br>

                On 17/01/2014 20:25, Stepan Dyatkovskiy wrote:<br>

<br>

                       Hi all,<br>

<br>

                       I propose simple improvement for MergeFunctions<br>

                    pass, that reduced<br>

                    its<br>

                       complexity from O(N^2) to O(log(N)), where N is<br>

                    number of<br>

                    functions in<br>

                       module.<br>

<br>

                       The idea, is to replace the result of comparison<br>

                    from "bool" to<br>

                       "-1,0,1". In another words: define order relation<br>

                    on functions set.<br>

                       To be sure, that functions could be comparable<br>

                    that way, we have to<br>

                       prove that order relation is possible on all<br>

                    comparison stage.<br>

<br>

                       The last one is possible, if for each comparison<br>

                    stage we<br>

                    implement has<br>

                       next properties:<br>

                       * reflexivity (a <= a, a == a, a >= a),<br>

                       * antisymmetry (if a <= b and b <= a then a  == b),<br>

                       * transitivity (a <= b and b <= c, then a <= c)<br>

                       * asymmetry (if a < b, then a > b or a == b).<br>

<br>

                       Once we have defined order relation we can store<br>

                    all the functions in<br>

                       binary tree and perform lookup in O(log(N)) time.<br>

<br>

                       This post has two attachments:<br>

                       1. The patch, that has implementation of this idea.<br>

                       2. The MergeFunctions pass detailed description,<br>

                    with explanation how<br>

                       order relation could be possible.<br>

<br>

                       Hope it helps to make things better!<br>

<br>

                       -Stepan.<br>

<br></div></div>

                       ______________________________<u></u>___________________<br>

                       LLVM Developers mailing list<br>

                    <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <mailto:<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>><br>

                    <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

                    <a href="http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/__<u></u>mailman/listinfo/llvmdev</a><br>

                    <<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvmdev</a>><br>

<br>

<br>

<br>

    ______________________________<u></u>___________________<br>

    LLVM Developers mailing list<br>

    <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <mailto:<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>


    <a href="http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/__<u></u>mailman/listinfo/llvmdev</a><br>

    <<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvmdev</a>><br>

<br>

<br>

</blockquote>

<br>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

</div></div><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></blockquote></div></div></div>

</blockquote></div><br></div></div>