[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Mon Feb 17 06:34:34 PST 2014

ping
Stepan Dyatkovskiy wrote:
> ping
> Stepan Dyatkovskiy wrote:
>> Hi all,
>>
>> Previous patch has been split onto series of small changes.
>> On each stage (after each patch) MergeFunctions pass is compilable and
>> stable.
>>
>> Please find patches in attachment for review.
>>
>> To apply all patches at once, use "apply-patches.sh" script as follows:
>> 0. Place "apply-patches.sh" in same directory with patches.
>> 1. cd <llvm-sources-dir>
>> 2. "bash <path-to-patches-dir>/apply-patches.sh"
>>
>> -Stepan
>>
>> Stepan Dyatkovskiy wrote:
>>> Hi Nick,
>>> About partition refinement.
>>>
>>> It looks like patch proposed here allows to apply your idea with
>>> painless way (as series of small changes actually).
>>>
>>> If I got it right, you would like to introduce the next
>>> partition-refinement.
>>>
>>> Initial partition P consists of only one functions class X0 (whole
>>> module, single bucket). Suppose S is subset of X0, so S represents
>>> functions with some trait that is absent in rest part of X0.
>>>
>>> Then refinement looks like below:
>>>
>>> P`=refine(P, S):
>>>    X`[0] = intersection(X[0], S)
>>>    X`[1] = difference(X[0], S)  // X0 without S0
>>>
>>> so P`={X`[0], X`[1]}
>>>
>>> And actually X`[0] and X`[1] are classes of functions that differs with
>>> trait defined by S.
>>>
>>> Then (suppose, we need to introduce new trait, so it would be S`):
>>>
>>> P``=refine(P`, S`), so each X`` could be defined like below:
>>> for each X`[i] in P`:
>>>    X``[i*2] = intersection(X`[i], S`)
>>>    X``[i*2+1] = difference(X`[i], S`)
>>>
>>> and so on, until all the traits are over.
>>>
>>> Finally we got the classes (set of Xs) that consists of equivalent
>>> functions. So, we can merge functions inside the class bounds.
>>>
>>> Am I right?
>>>
>>> Patch, that was proposed here could be represented as such partition
>>> refinement then:
>>>
>>> On each refinement stage, you need to introduce S somehow, the algorithm
>>> that answers the question: "what sort of difference we want to check in
>>> this bucket?"
>>>
>>> In existing implementation we check traits already, but pair-wise
>>> (attributes, calling convention, number of arguments, number of basic
>>> blocks, etc.). How to to use the same traits in partition refinement?
>>>
>>> Consider next example. Suppose we want to split some bucket X:
>>>
>>> 0. Take some trait from *pair-wise* comparison. For example: number of
>>> arguments.
>>> 1. Take some function from bucket X, let it be F1. Let F1 has N1 number
>>> of arguments.
>>> 2. Then S are functions with number of arguments *less* than N1.
>>> 3. Do refinement by S. As result we got X`[0] and X`[1].
>>> 4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3 for them.
>>> 5. If some of X`[0] of X`[1] is empty set, then all functions in X are
>>> the same in context of trait S. We need another trait, go to #0. And
>>> repeat #0-#5 for all X` buckets we have.
>>>
>>> Finally we got the groups of equivalent functions. And actually these
>>> groups would be organized in binary tree.
>>> Most of stuff is implemented in std::map. I showed it as #0-#5 just to
>>> put it into context of your idea. The only problem was to introduce way
>>> of how to define order-relation (or how to define S). But this problem
>>> has been solved.
>>> And on output we got groups of equal functions.
>>>
>>> Fix me if I'm wrong. But it looks like every advantage given by
>>> partition refinement is present here.
>>>
>>> As I said in beginning, it is way of how to commit things you proposed
>>> with minimum of changes. We will use old comparison routines, only thing
>>> we need to change is return type (from bool to -1,0,1).
>>> In this way, partition refinement could be committed as series of small
>>> changes: one comparison routine per commit.
>>>
>>> Every change in MergeFunctions from now could be well checked by this
>>> builder:
>>> Now we have special builder for MergeFunctions. It runs test-suite with
>>> forced MergeFunctions pass:
>>> http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2
>>> Currently, MergeFunctions pass is stable, builder is green and passes
>>> all test-suite programs.
>>>
>>> -Stepan.
>>>
>>> Stepan Dyatkovskiy wrote:
>>>> Hi all,
>>>> Please find the updated patch in attachment:
>>>> * Added some comments.
>>>> * Fixed some typos.
>>>>
>>>> -Stepan
>>>>
>>>> Nick Lewycky wrote:
>>>>> On 30 January 2014 01:27, Stepan Dyatkovskiy <stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru>> wrote:
>>>>>
>>>>>     Hello Sean and Tobias,
>>>>>
>>>>>     Sean,
>>>>>     Thank you. Could you describe Nick's ideas in few words or give me
>>>>>     links to your discussion, so I could adapt my ideas to it.
>>>>>
>>>>>
>>>>> Sorry I haven't had time to write it up. The essential idea is that we
>>>>> use partition-refinement to determine the differences between a
>>>>> group of
>>>>> functions instead of using pair-wise comparison.
>>>>>
>>>>> Imagine you want to compare long sequences of numbers. You can grab a
>>>>> pair of sequences and run down them until they differ or you reach the
>>>>> end, and move on to another pair of sequences. With partition
>>>>> refinement
>>>>> you start with the first number in all the sequences and bucket them.
>>>>> Say, three 1's, two 2's, and an 8. You've just formed 3 buckets. For
>>>>> each bucket (with more than 1 member), do it again with the next
>>>>> number
>>>>> in the sequence. This gets you an equivalence comparison at less than
>>>>> the complexity of pairwise comparisons. (The buckets are partitions,
>>>>> and
>>>>> you're refining them as you read more data. Partition refinement is
>>>>> the
>>>>> dual of union-find, which we use quite heavily in the compiler.)
>>>>>
>>>>> I haven't figured out how this stacks up against Stepan's sorting with
>>>>> <=> comparison. I can probably do that, I just haven't spent the time
>>>>> thinking about it.
>>>>>
>>>>> I also haven't thought through how to integrate it with "near miss"
>>>>> comparison. My current leading idea is that you do the same thing, but
>>>>> make a record of what makes two buckets similar. (Note that the wrong
>>>>> solution is to store a per-function-pair list of
>>>>> similarities/differences.)
>>>>>
>>>>> Nick
>>>>>
>>>>>     Tobias,
>>>>>     Your patch fails on several modules in my benchmark (73 of ~1800
>>>>>     tests). I have sent one as attachment.
>>>>>
>>>>>     See statistics files for more details, all the .ll files you could
>>>>>     simply find in test-suite object directory (after "make
>>>>> TEST=nightly
>>>>>     report").
>>>>>     I have attached script that launches benchmark. Example of usage
>>>>>     (you need Linux OS):
>>>>>     bash test-scipt.sh report.txt opt-no-patch opt-with-patch
>>>>>
>>>>>     Some numbers.
>>>>>
>>>>>     Total number of functions: 52715
>>>>>
>>>>>     Below, by default, I have excluded modules where your patch fails.
>>>>>
>>>>>     Functions merged
>>>>>     Original version: 1113
>>>>>     Order relation patch: 1112
>>>>>     Similar merging patch: 541
>>>>>
>>>>>     Functions merged (including failed tests)
>>>>>     Original version: 8398
>>>>>     Order relation patch: 8395
>>>>>
>>>>>     Summary files size
>>>>>     Initial:                                 163595634 bytes.
>>>>>     After merging with order relation patch: 163147731 bytes.
>>>>>     After similar merging patch:             162491905 bytes.
>>>>>
>>>>>     Summary files size (including failed tests)
>>>>>     Initial:          250242266 bytes.
>>>>>     Original version: 247168198 bytes.
>>>>>     Order relation:   247175650 bytes.
>>>>>
>>>>>     Time. I measured with "time" utility, and used "real (wall clock)
>>>>>     time used by the process".
>>>>>
>>>>>     Summary time spent
>>>>>     With existing version:     28.05 secs.
>>>>>     With order relation patch: 28.19 secs.
>>>>>     Similar merging patch:     28.61 secs.
>>>>>
>>>>>     Summary time spent (including failed tests)
>>>>>     With existing version:     41.74 secs.
>>>>>     With order relation patch: 36.35 secs.
>>>>>
>>>>>     -Stepan
>>>>>
>>>>>     Sean Silva wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>>>>         <tobias.von.koch at gmail.com <mailto:tobias.von.koch at gmail.com>
>>>>>         <mailto:tobias.von.koch at gmail.__com
>>>>>         <mailto:tobias.von.koch at gmail.com>>> wrote:
>>>>>
>>>>>              Hi Stepan,
>>>>>
>>>>>              Sorry for the delay. It's great that you are working on
>>>>>              MergeFunctions as well and I agree, we should definitely
>>>>> try to
>>>>>              combine our efforts to improve MergeFunctions.
>>>>>
>>>>>              Just to give you some context, the pass (with the similar
>>>>>         function
>>>>>              merging patch) is already being used in a production
>>>>>         setting. From
>>>>>              my point of view, it would be better if we focus on
>>>>>         improving its
>>>>>              capability of merging functions at this stage rather
>>>>> than on
>>>>>              reduction of compilation time.
>>>>>
>>>>>              I'll give a brief overview of how our similar function
>>>>> merging
>>>>>              algorithm works (you can also watch the presentation from
>>>>>         the US
>>>>>              LLVM conference to hear me explain it using an
>>>>> animation).
>>>>>
>>>>>              1. Hash all functions into buckets
>>>>>
>>>>>              In each bucket, separately:
>>>>>
>>>>>                2. Compare functions pair-wise and determine a
>>>>>                   similarity metric for each pair (%age of equivalent
>>>>>         instructions)
>>>>>
>>>>>                3. Merge identical functions (with similarity = 100%),
>>>>>         update call
>>>>>                   sites for those functions.
>>>>>
>>>>>                4. If the updates of call sites have touched other
>>>>> functions,
>>>>>                   go back to step 2 and re-compare *only* those
>>>>>         functions to all
>>>>>                   others in their buckets.
>>>>>
>>>>>                Finally,
>>>>>
>>>>>                5. Form groups of similar functions for merging:
>>>>>                   a) Pick most similar pair (A,B)
>>>>>                   b) Find all functions C that are also similar to B
>>>>> but
>>>>>         are not
>>>>>                      more similar to some other function D.
>>>>>                   c) Merge A with B and all the C's.
>>>>>                   Repeat until there are no more functions to merge.
>>>>>
>>>>>              As you can see, we need to compute a similarity measure
>>>>> for
>>>>>         each
>>>>>              pair of functions in a bucket. If I understand correctly,
>>>>>         your patch
>>>>>              reduces compile-time by avoiding comparisons of functions
>>>>> that
>>>>>              cannot be identical. That's exactly what you want to
>>>>> do if
>>>>>         you only
>>>>>              want to merge identical functions. However, because we
>>>>> also
>>>>>         need to
>>>>>              determine whether functions are merely *similar*, I can't
>>>>>         see how
>>>>>              your idea could be applied in that case.
>>>>>
>>>>>
>>>>>         Yeah, the existing pass only tries to merge identical
>>>>> functions
>>>>>         (identical in machine code). I'm wondering if the "similar"
>>>>> function
>>>>>         merging would be better as a separate pass (I'm genuinely
>>>>>         wondering; I
>>>>>         have no clue). I would wait for Nick's feedback.
>>>>>
>>>>>         -- Sean Silva
>>>>>
>>>>>
>>>>>              Looking at your statistics, I see that you are
>>>>> considering
>>>>>         a very
>>>>>              large number of functions. I don't know which benchmarks
>>>>>         you are
>>>>>              using, but I doubt that many of these are worth merging.
>>>>> For
>>>>>              instance, you rarely gain a code size benefit from
>>>>> merging
>>>>>         functions
>>>>>              with just a few instructions. Our patch takes care of
>>>>> this
>>>>>         using
>>>>>              thresholds. It's worth looking at actual code size
>>>>> reductions,
>>>>>              rather than just numbers of functions merged/ compilation
>>>>>         time spent
>>>>>              comparing functions. It turns out that the number of
>>>>>         functions in
>>>>>              each bucket is really quite small once you apply some
>>>>>         heuristics
>>>>>              (and could still be further reduced!).
>>>>>
>>>>>              Given your experience with MergeFunctions, it would be
>>>>>         really great
>>>>>              if you could review our patch and also try it out on your
>>>>>         benchmarks.
>>>>>
>>>>>              Tobias
>>>>>
>>>>>
>>>>>              On 24/01/2014 19:11, Stepan Dyatkovskiy wrote:
>>>>>
>>>>>                  Hi Tobias.
>>>>>
>>>>>                  So, what do you think?
>>>>>
>>>>>                  If it means to much extra-job for your team, may be I
>>>>>         can help you
>>>>>                  somehow? I really would like to.
>>>>>
>>>>>                  -Stepan
>>>>>
>>>>>                  Stepan Dyatkovskiy wrote:
>>>>>
>>>>>                      Hi Tobias,
>>>>>
>>>>>                          I can't really see a way to combine our
>>>>>         approach with
>>>>>                          your patch. What
>>>>>                          are your thoughts?
>>>>>
>>>>>
>>>>>                      I think it is possible. Even more - we need to
>>>>>         combine our
>>>>>                      efforts, in
>>>>>                      order to bring this pass into real live.
>>>>>                      I'have read your presentation file, and
>>>>>         unfortunately read
>>>>>                      your patch
>>>>>                      only a little.
>>>>>                      How exactly you scan functions on 2nd stage?
>>>>> Could you
>>>>>                      explain the
>>>>>                      algorithm in few words, how you compare workflow?
>>>>> Is it
>>>>>                      possible to
>>>>>                      scan binary tree instead of hash table? ...OK.
>>>>>
>>>>>                      That's how I see the modification. Now its
>>>>> important to
>>>>>                      explain idea,
>>>>>                      so consider the simplest case: you need to catch
>>>>> two
>>>>>                      functions that
>>>>>                      are differs with single instruction somewhere.
>>>>>
>>>>>                      1. Imagine, that IR *module* contents is
>>>>> represented as
>>>>>                      binary tree:
>>>>>                      Each line (trace) from root to terminal node is a
>>>>>         function.
>>>>>                      Each node - is function's primitive (instruction
>>>>>         opcode, for
>>>>>                      example).
>>>>>                      Down - is direction to terminal nodes, up - is
>>>>>         direction to
>>>>>                      the root.
>>>>>
>>>>>                      2. Now you are standing on some node. And you
>>>>> have two
>>>>>                      directions
>>>>>                      down-left and down-right.
>>>>>
>>>>>                      3. If you are able to find two equal sub-traces
>>>>>         down (at
>>>>>                      left and at
>>>>>                      right), then the only difference lies in this
>>>>> node.
>>>>>         Then we
>>>>>                      catch that
>>>>>                      case.
>>>>>
>>>>>                      4. Even more, if two subtrees at left and at
>>>>> right are
>>>>>                      equal, than you
>>>>>                      catch at once all the cases that are
>>>>> represented by
>>>>>         these
>>>>>                      subtrees.
>>>>>
>>>>>                      I still didn't look at you patch carefully.
>>>>> Sorry..
>>>>>         But I
>>>>>                      hope that
>>>>>                      helps, and I'll try look at it in nearest time
>>>>> and
>>>>>         perhaps
>>>>>                      its not the
>>>>>                      best solution I gave in this post.
>>>>>
>>>>>                      -Stepan
>>>>>
>>>>>                      22.01.2014, 20:53, "Tobias von Koch"
>>>>>                      <tobias.von.koch at gmail.com
>>>>>         <mailto:tobias.von.koch at gmail.com>
>>>>>         <mailto:tobias.von.koch at gmail.__com
>>>>>         <mailto:tobias.von.koch at gmail.com>>>:
>>>>>
>>>>>
>>>>>                          Hi Stepan,
>>>>>
>>>>>                          As you've seen we have recently implemented a
>>>>>                          significant enhancement to
>>>>>                          the MergeFunctions pass that also allows
>>>>> merging of
>>>>>                          functions that are
>>>>>                          only similar but not identical
>>>>>                          (http://llvm-reviews.__chandle__rc.com/D2591
>>>>>         <http://chandlerc.com/D2591>
>>>>>                          <http://llvm-reviews.__chandlerc.com/D2591
>>>>>         <http://llvm-reviews.chandlerc.com/D2591>>).
>>>>>
>>>>>
>>>>>                          Our patch also changes the way that functions
>>>>> are
>>>>>                          compared quite
>>>>>                          significantly. This is necessary because we
>>>>> need to
>>>>>                          compare all
>>>>>                          functions in a bucket in order to get a
>>>>> similarity
>>>>>                          measure for each
>>>>>                          pair, so we can then decide which 'groups' of
>>>>>         functions
>>>>>                          to merge.
>>>>>
>>>>>                          I can't really see a way to combine our
>>>>>         approach with
>>>>>                          your patch. What
>>>>>                          are your thoughts?
>>>>>
>>>>>                          Another way to improve the performance of
>>>>>         MergeFunctions
>>>>>                          might be to
>>>>>                          make the hash function better. I've already
>>>>>         added the
>>>>>                          size of the first
>>>>>                          BB to it, and perhaps there are other factors
>>>>>         we could
>>>>>                          try... if we
>>>>>                          don't have to compare functions in the first
>>>>> place
>>>>>                          (because they're in
>>>>>                          different buckets) then that's obviously a
>>>>> much
>>>>>         bigger win.
>>>>>
>>>>>                          Thanks,
>>>>>                          Tobias
>>>>>
>>>>>                          On 17/01/2014 20:25, Stepan Dyatkovskiy
>>>>> wrote:
>>>>>
>>>>>                                 Hi all,
>>>>>
>>>>>                                 I propose simple improvement for
>>>>>         MergeFunctions
>>>>>                              pass, that reduced
>>>>>                              its
>>>>>                                 complexity from O(N^2) to O(log(N)),
>>>>>         where N is
>>>>>                              number of
>>>>>                              functions in
>>>>>                                 module.
>>>>>
>>>>>                                 The idea, is to replace the result of
>>>>>         comparison
>>>>>                              from "bool" to
>>>>>                                 "-1,0,1". In another words: define
>>>>> order
>>>>>         relation
>>>>>                              on functions set.
>>>>>                                 To be sure, that functions could be
>>>>>         comparable
>>>>>                              that way, we have to
>>>>>                                 prove that order relation is possible
>>>>> on all
>>>>>                              comparison stage.
>>>>>
>>>>>                                 The last one is possible, if for each
>>>>>         comparison
>>>>>                              stage we
>>>>>                              implement has
>>>>>                                 next properties:
>>>>>                                 * reflexivity (a <= a, a == a, a >=
>>>>> a),
>>>>>                                 * antisymmetry (if a <= b and b <= a
>>>>>         then a  == b),
>>>>>                                 * transitivity (a <= b and b <= c,
>>>>> then
>>>>>         a <= c)
>>>>>                                 * asymmetry (if a < b, then a > b or a
>>>>>         == b).
>>>>>
>>>>>                                 Once we have defined order relation we
>>>>>         can store
>>>>>                              all the functions in
>>>>>                                 binary tree and perform lookup in
>>>>>         O(log(N)) time.
>>>>>
>>>>>                                 This post has two attachments:
>>>>>                                 1. The patch, that has
>>>>> implementation of
>>>>>         this idea.
>>>>>                                 2. The MergeFunctions pass detailed
>>>>>         description,
>>>>>                              with explanation how
>>>>>                                 order relation could be possible.
>>>>>
>>>>>                                 Hope it helps to make things better!
>>>>>
>>>>>                                 -Stepan.
>>>>>
>>>>>
>>>>>           ___________________________________________________
>>>>>                                 LLVM Developers mailing list
>>>>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>>>>         http://llvm.cs.uiuc.edu
>>>>>         http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>>>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>>>
>>>>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>>
>>>>>
>>>>>
>>>>>              ___________________________________________________
>>>>>              LLVM Developers mailing list
>>>>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>>>>         http://llvm.cs.uiuc.edu
>>>>>         http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>>>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>>>              <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     LLVM Developers mailing list
>>>>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>> http://llvm.cs.uiuc.edu
>>>>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>