[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Stepan Dyatkovskiy stpworld at narod.ru
Mon Feb 10 09:40:44 PST 2014


ping
Stepan Dyatkovskiy wrote:
> Hi all,
>
> Previous patch has been split onto series of small changes.
> On each stage (after each patch) MergeFunctions pass is compilable and
> stable.
>
> Please find patches in attachment for review.
>
> To apply all patches at once, use "apply-patches.sh" script as follows:
> 0. Place "apply-patches.sh" in same directory with patches.
> 1. cd <llvm-sources-dir>
> 2. "bash <path-to-patches-dir>/apply-patches.sh"
>
> -Stepan
>
> Stepan Dyatkovskiy wrote:
>> Hi Nick,
>> About partition refinement.
>>
>> It looks like patch proposed here allows to apply your idea with
>> painless way (as series of small changes actually).
>>
>> If I got it right, you would like to introduce the next
>> partition-refinement.
>>
>> Initial partition P consists of only one functions class X0 (whole
>> module, single bucket). Suppose S is subset of X0, so S represents
>> functions with some trait that is absent in rest part of X0.
>>
>> Then refinement looks like below:
>>
>> P`=refine(P, S):
>>    X`[0] = intersection(X[0], S)
>>    X`[1] = difference(X[0], S)  // X0 without S0
>>
>> so P`={X`[0], X`[1]}
>>
>> And actually X`[0] and X`[1] are classes of functions that differs with
>> trait defined by S.
>>
>> Then (suppose, we need to introduce new trait, so it would be S`):
>>
>> P``=refine(P`, S`), so each X`` could be defined like below:
>> for each X`[i] in P`:
>>    X``[i*2] = intersection(X`[i], S`)
>>    X``[i*2+1] = difference(X`[i], S`)
>>
>> and so on, until all the traits are over.
>>
>> Finally we got the classes (set of Xs) that consists of equivalent
>> functions. So, we can merge functions inside the class bounds.
>>
>> Am I right?
>>
>> Patch, that was proposed here could be represented as such partition
>> refinement then:
>>
>> On each refinement stage, you need to introduce S somehow, the algorithm
>> that answers the question: "what sort of difference we want to check in
>> this bucket?"
>>
>> In existing implementation we check traits already, but pair-wise
>> (attributes, calling convention, number of arguments, number of basic
>> blocks, etc.). How to to use the same traits in partition refinement?
>>
>> Consider next example. Suppose we want to split some bucket X:
>>
>> 0. Take some trait from *pair-wise* comparison. For example: number of
>> arguments.
>> 1. Take some function from bucket X, let it be F1. Let F1 has N1 number
>> of arguments.
>> 2. Then S are functions with number of arguments *less* than N1.
>> 3. Do refinement by S. As result we got X`[0] and X`[1].
>> 4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3 for them.
>> 5. If some of X`[0] of X`[1] is empty set, then all functions in X are
>> the same in context of trait S. We need another trait, go to #0. And
>> repeat #0-#5 for all X` buckets we have.
>>
>> Finally we got the groups of equivalent functions. And actually these
>> groups would be organized in binary tree.
>> Most of stuff is implemented in std::map. I showed it as #0-#5 just to
>> put it into context of your idea. The only problem was to introduce way
>> of how to define order-relation (or how to define S). But this problem
>> has been solved.
>> And on output we got groups of equal functions.
>>
>> Fix me if I'm wrong. But it looks like every advantage given by
>> partition refinement is present here.
>>
>> As I said in beginning, it is way of how to commit things you proposed
>> with minimum of changes. We will use old comparison routines, only thing
>> we need to change is return type (from bool to -1,0,1).
>> In this way, partition refinement could be committed as series of small
>> changes: one comparison routine per commit.
>>
>> Every change in MergeFunctions from now could be well checked by this
>> builder:
>> Now we have special builder for MergeFunctions. It runs test-suite with
>> forced MergeFunctions pass:
>> http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2
>> Currently, MergeFunctions pass is stable, builder is green and passes
>> all test-suite programs.
>>
>> -Stepan.
>>
>> Stepan Dyatkovskiy wrote:
>>> Hi all,
>>> Please find the updated patch in attachment:
>>> * Added some comments.
>>> * Fixed some typos.
>>>
>>> -Stepan
>>>
>>> Nick Lewycky wrote:
>>>> On 30 January 2014 01:27, Stepan Dyatkovskiy <stpworld at narod.ru
>>>> <mailto:stpworld at narod.ru>> wrote:
>>>>
>>>>     Hello Sean and Tobias,
>>>>
>>>>     Sean,
>>>>     Thank you. Could you describe Nick's ideas in few words or give me
>>>>     links to your discussion, so I could adapt my ideas to it.
>>>>
>>>>
>>>> Sorry I haven't had time to write it up. The essential idea is that we
>>>> use partition-refinement to determine the differences between a
>>>> group of
>>>> functions instead of using pair-wise comparison.
>>>>
>>>> Imagine you want to compare long sequences of numbers. You can grab a
>>>> pair of sequences and run down them until they differ or you reach the
>>>> end, and move on to another pair of sequences. With partition
>>>> refinement
>>>> you start with the first number in all the sequences and bucket them.
>>>> Say, three 1's, two 2's, and an 8. You've just formed 3 buckets. For
>>>> each bucket (with more than 1 member), do it again with the next number
>>>> in the sequence. This gets you an equivalence comparison at less than
>>>> the complexity of pairwise comparisons. (The buckets are partitions,
>>>> and
>>>> you're refining them as you read more data. Partition refinement is the
>>>> dual of union-find, which we use quite heavily in the compiler.)
>>>>
>>>> I haven't figured out how this stacks up against Stepan's sorting with
>>>> <=> comparison. I can probably do that, I just haven't spent the time
>>>> thinking about it.
>>>>
>>>> I also haven't thought through how to integrate it with "near miss"
>>>> comparison. My current leading idea is that you do the same thing, but
>>>> make a record of what makes two buckets similar. (Note that the wrong
>>>> solution is to store a per-function-pair list of
>>>> similarities/differences.)
>>>>
>>>> Nick
>>>>
>>>>     Tobias,
>>>>     Your patch fails on several modules in my benchmark (73 of ~1800
>>>>     tests). I have sent one as attachment.
>>>>
>>>>     See statistics files for more details, all the .ll files you could
>>>>     simply find in test-suite object directory (after "make
>>>> TEST=nightly
>>>>     report").
>>>>     I have attached script that launches benchmark. Example of usage
>>>>     (you need Linux OS):
>>>>     bash test-scipt.sh report.txt opt-no-patch opt-with-patch
>>>>
>>>>     Some numbers.
>>>>
>>>>     Total number of functions: 52715
>>>>
>>>>     Below, by default, I have excluded modules where your patch fails.
>>>>
>>>>     Functions merged
>>>>     Original version: 1113
>>>>     Order relation patch: 1112
>>>>     Similar merging patch: 541
>>>>
>>>>     Functions merged (including failed tests)
>>>>     Original version: 8398
>>>>     Order relation patch: 8395
>>>>
>>>>     Summary files size
>>>>     Initial:                                 163595634 bytes.
>>>>     After merging with order relation patch: 163147731 bytes.
>>>>     After similar merging patch:             162491905 bytes.
>>>>
>>>>     Summary files size (including failed tests)
>>>>     Initial:          250242266 bytes.
>>>>     Original version: 247168198 bytes.
>>>>     Order relation:   247175650 bytes.
>>>>
>>>>     Time. I measured with "time" utility, and used "real (wall clock)
>>>>     time used by the process".
>>>>
>>>>     Summary time spent
>>>>     With existing version:     28.05 secs.
>>>>     With order relation patch: 28.19 secs.
>>>>     Similar merging patch:     28.61 secs.
>>>>
>>>>     Summary time spent (including failed tests)
>>>>     With existing version:     41.74 secs.
>>>>     With order relation patch: 36.35 secs.
>>>>
>>>>     -Stepan
>>>>
>>>>     Sean Silva wrote:
>>>>
>>>>
>>>>
>>>>
>>>>         On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>>>         <tobias.von.koch at gmail.com <mailto:tobias.von.koch at gmail.com>
>>>>         <mailto:tobias.von.koch at gmail.__com
>>>>         <mailto:tobias.von.koch at gmail.com>>> wrote:
>>>>
>>>>              Hi Stepan,
>>>>
>>>>              Sorry for the delay. It's great that you are working on
>>>>              MergeFunctions as well and I agree, we should definitely
>>>> try to
>>>>              combine our efforts to improve MergeFunctions.
>>>>
>>>>              Just to give you some context, the pass (with the similar
>>>>         function
>>>>              merging patch) is already being used in a production
>>>>         setting. From
>>>>              my point of view, it would be better if we focus on
>>>>         improving its
>>>>              capability of merging functions at this stage rather
>>>> than on
>>>>              reduction of compilation time.
>>>>
>>>>              I'll give a brief overview of how our similar function
>>>> merging
>>>>              algorithm works (you can also watch the presentation from
>>>>         the US
>>>>              LLVM conference to hear me explain it using an animation).
>>>>
>>>>              1. Hash all functions into buckets
>>>>
>>>>              In each bucket, separately:
>>>>
>>>>                2. Compare functions pair-wise and determine a
>>>>                   similarity metric for each pair (%age of equivalent
>>>>         instructions)
>>>>
>>>>                3. Merge identical functions (with similarity = 100%),
>>>>         update call
>>>>                   sites for those functions.
>>>>
>>>>                4. If the updates of call sites have touched other
>>>> functions,
>>>>                   go back to step 2 and re-compare *only* those
>>>>         functions to all
>>>>                   others in their buckets.
>>>>
>>>>                Finally,
>>>>
>>>>                5. Form groups of similar functions for merging:
>>>>                   a) Pick most similar pair (A,B)
>>>>                   b) Find all functions C that are also similar to B
>>>> but
>>>>         are not
>>>>                      more similar to some other function D.
>>>>                   c) Merge A with B and all the C's.
>>>>                   Repeat until there are no more functions to merge.
>>>>
>>>>              As you can see, we need to compute a similarity measure
>>>> for
>>>>         each
>>>>              pair of functions in a bucket. If I understand correctly,
>>>>         your patch
>>>>              reduces compile-time by avoiding comparisons of functions
>>>> that
>>>>              cannot be identical. That's exactly what you want to do if
>>>>         you only
>>>>              want to merge identical functions. However, because we
>>>> also
>>>>         need to
>>>>              determine whether functions are merely *similar*, I can't
>>>>         see how
>>>>              your idea could be applied in that case.
>>>>
>>>>
>>>>         Yeah, the existing pass only tries to merge identical functions
>>>>         (identical in machine code). I'm wondering if the "similar"
>>>> function
>>>>         merging would be better as a separate pass (I'm genuinely
>>>>         wondering; I
>>>>         have no clue). I would wait for Nick's feedback.
>>>>
>>>>         -- Sean Silva
>>>>
>>>>
>>>>              Looking at your statistics, I see that you are considering
>>>>         a very
>>>>              large number of functions. I don't know which benchmarks
>>>>         you are
>>>>              using, but I doubt that many of these are worth merging.
>>>> For
>>>>              instance, you rarely gain a code size benefit from merging
>>>>         functions
>>>>              with just a few instructions. Our patch takes care of this
>>>>         using
>>>>              thresholds. It's worth looking at actual code size
>>>> reductions,
>>>>              rather than just numbers of functions merged/ compilation
>>>>         time spent
>>>>              comparing functions. It turns out that the number of
>>>>         functions in
>>>>              each bucket is really quite small once you apply some
>>>>         heuristics
>>>>              (and could still be further reduced!).
>>>>
>>>>              Given your experience with MergeFunctions, it would be
>>>>         really great
>>>>              if you could review our patch and also try it out on your
>>>>         benchmarks.
>>>>
>>>>              Tobias
>>>>
>>>>
>>>>              On 24/01/2014 19:11, Stepan Dyatkovskiy wrote:
>>>>
>>>>                  Hi Tobias.
>>>>
>>>>                  So, what do you think?
>>>>
>>>>                  If it means to much extra-job for your team, may be I
>>>>         can help you
>>>>                  somehow? I really would like to.
>>>>
>>>>                  -Stepan
>>>>
>>>>                  Stepan Dyatkovskiy wrote:
>>>>
>>>>                      Hi Tobias,
>>>>
>>>>                          I can't really see a way to combine our
>>>>         approach with
>>>>                          your patch. What
>>>>                          are your thoughts?
>>>>
>>>>
>>>>                      I think it is possible. Even more - we need to
>>>>         combine our
>>>>                      efforts, in
>>>>                      order to bring this pass into real live.
>>>>                      I'have read your presentation file, and
>>>>         unfortunately read
>>>>                      your patch
>>>>                      only a little.
>>>>                      How exactly you scan functions on 2nd stage?
>>>> Could you
>>>>                      explain the
>>>>                      algorithm in few words, how you compare workflow?
>>>> Is it
>>>>                      possible to
>>>>                      scan binary tree instead of hash table? ...OK.
>>>>
>>>>                      That's how I see the modification. Now its
>>>> important to
>>>>                      explain idea,
>>>>                      so consider the simplest case: you need to catch
>>>> two
>>>>                      functions that
>>>>                      are differs with single instruction somewhere.
>>>>
>>>>                      1. Imagine, that IR *module* contents is
>>>> represented as
>>>>                      binary tree:
>>>>                      Each line (trace) from root to terminal node is a
>>>>         function.
>>>>                      Each node - is function's primitive (instruction
>>>>         opcode, for
>>>>                      example).
>>>>                      Down - is direction to terminal nodes, up - is
>>>>         direction to
>>>>                      the root.
>>>>
>>>>                      2. Now you are standing on some node. And you
>>>> have two
>>>>                      directions
>>>>                      down-left and down-right.
>>>>
>>>>                      3. If you are able to find two equal sub-traces
>>>>         down (at
>>>>                      left and at
>>>>                      right), then the only difference lies in this
>>>> node.
>>>>         Then we
>>>>                      catch that
>>>>                      case.
>>>>
>>>>                      4. Even more, if two subtrees at left and at
>>>> right are
>>>>                      equal, than you
>>>>                      catch at once all the cases that are
>>>> represented by
>>>>         these
>>>>                      subtrees.
>>>>
>>>>                      I still didn't look at you patch carefully.
>>>> Sorry..
>>>>         But I
>>>>                      hope that
>>>>                      helps, and I'll try look at it in nearest time and
>>>>         perhaps
>>>>                      its not the
>>>>                      best solution I gave in this post.
>>>>
>>>>                      -Stepan
>>>>
>>>>                      22.01.2014, 20:53, "Tobias von Koch"
>>>>                      <tobias.von.koch at gmail.com
>>>>         <mailto:tobias.von.koch at gmail.com>
>>>>         <mailto:tobias.von.koch at gmail.__com
>>>>         <mailto:tobias.von.koch at gmail.com>>>:
>>>>
>>>>
>>>>                          Hi Stepan,
>>>>
>>>>                          As you've seen we have recently implemented a
>>>>                          significant enhancement to
>>>>                          the MergeFunctions pass that also allows
>>>> merging of
>>>>                          functions that are
>>>>                          only similar but not identical
>>>>                          (http://llvm-reviews.__chandle__rc.com/D2591
>>>>         <http://chandlerc.com/D2591>
>>>>                          <http://llvm-reviews.__chandlerc.com/D2591
>>>>         <http://llvm-reviews.chandlerc.com/D2591>>).
>>>>
>>>>
>>>>                          Our patch also changes the way that functions
>>>> are
>>>>                          compared quite
>>>>                          significantly. This is necessary because we
>>>> need to
>>>>                          compare all
>>>>                          functions in a bucket in order to get a
>>>> similarity
>>>>                          measure for each
>>>>                          pair, so we can then decide which 'groups' of
>>>>         functions
>>>>                          to merge.
>>>>
>>>>                          I can't really see a way to combine our
>>>>         approach with
>>>>                          your patch. What
>>>>                          are your thoughts?
>>>>
>>>>                          Another way to improve the performance of
>>>>         MergeFunctions
>>>>                          might be to
>>>>                          make the hash function better. I've already
>>>>         added the
>>>>                          size of the first
>>>>                          BB to it, and perhaps there are other factors
>>>>         we could
>>>>                          try... if we
>>>>                          don't have to compare functions in the first
>>>> place
>>>>                          (because they're in
>>>>                          different buckets) then that's obviously a
>>>> much
>>>>         bigger win.
>>>>
>>>>                          Thanks,
>>>>                          Tobias
>>>>
>>>>                          On 17/01/2014 20:25, Stepan Dyatkovskiy wrote:
>>>>
>>>>                                 Hi all,
>>>>
>>>>                                 I propose simple improvement for
>>>>         MergeFunctions
>>>>                              pass, that reduced
>>>>                              its
>>>>                                 complexity from O(N^2) to O(log(N)),
>>>>         where N is
>>>>                              number of
>>>>                              functions in
>>>>                                 module.
>>>>
>>>>                                 The idea, is to replace the result of
>>>>         comparison
>>>>                              from "bool" to
>>>>                                 "-1,0,1". In another words: define
>>>> order
>>>>         relation
>>>>                              on functions set.
>>>>                                 To be sure, that functions could be
>>>>         comparable
>>>>                              that way, we have to
>>>>                                 prove that order relation is possible
>>>> on all
>>>>                              comparison stage.
>>>>
>>>>                                 The last one is possible, if for each
>>>>         comparison
>>>>                              stage we
>>>>                              implement has
>>>>                                 next properties:
>>>>                                 * reflexivity (a <= a, a == a, a >= a),
>>>>                                 * antisymmetry (if a <= b and b <= a
>>>>         then a  == b),
>>>>                                 * transitivity (a <= b and b <= c, then
>>>>         a <= c)
>>>>                                 * asymmetry (if a < b, then a > b or a
>>>>         == b).
>>>>
>>>>                                 Once we have defined order relation we
>>>>         can store
>>>>                              all the functions in
>>>>                                 binary tree and perform lookup in
>>>>         O(log(N)) time.
>>>>
>>>>                                 This post has two attachments:
>>>>                                 1. The patch, that has
>>>> implementation of
>>>>         this idea.
>>>>                                 2. The MergeFunctions pass detailed
>>>>         description,
>>>>                              with explanation how
>>>>                                 order relation could be possible.
>>>>
>>>>                                 Hope it helps to make things better!
>>>>
>>>>                                 -Stepan.
>>>>
>>>>
>>>>           ___________________________________________________
>>>>                                 LLVM Developers mailing list
>>>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>>>         http://llvm.cs.uiuc.edu
>>>>         http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>>
>>>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>
>>>>
>>>>
>>>>              ___________________________________________________
>>>>              LLVM Developers mailing list
>>>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>>>         http://llvm.cs.uiuc.edu
>>>>         http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>>              <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     LLVM Developers mailing list
>>>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>> http://llvm.cs.uiuc.edu
>>>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>




More information about the llvm-dev mailing list