[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Fri Jan 31 06:32:30 PST 2014

Hi Nick,
About your partition refinement.

It looks like patch proposed here allows to apply your idea with 
painless way (as series of small changes actually).

If I got it right, you would like to introduce the next 
partition-refinement.

Initial partition P consists of only one functions class X0 (whole 
module, single bucket). Suppose S is subset of X0, so S represents 
functions with some trait that is absent in rest part of X0.

Then refinement looks like below:

P`=refine(P, S):
   X`[0] = intersection(X[0], S)
   X`[1] = difference(X[0], S)  // X0 without S0

so P`={X`[0], X`[1]}

And actually X`[0] and X`[1] are classes of functions that differs with 
trait defined by S.

Then (suppose, we need to introduce new trait, so it would be S`):

P``=refine(P`, S`), so each X`` could be defined like below:
for each X`[i] in P`:
   X``[i*2] = intersection(X`[i], S`)
   X``[i*2+1] = difference(X`[i], S`)

and so on, until all the traits are over.

Finally we got the classes (set of Xs) that consists of equivalent 
functions. So, we can merge functions inside the class bounds.

Am I right?

Patch, that was proposed here could be represented as such partition 
refinement then:

On each refinement stage, you need to introduce S somehow, the algorithm 
that answers the question: "what sort of difference we want to check in 
this bucket?"

In existing implementation we check traits already, but pair-wise 
(attributes, calling convention, number of arguments, number of basic 
blocks, etc.). How to to use the same traits in partition refinement?

Consider next example. Suppose we want to split some bucket X:

0. Take some trait from *pair-wise* comparison. For example: number of 
arguments.
1. Take some function from bucket X, let it be F1. Let F1 has N1 number 
of arguments.
2. Then S are functions with number of arguments *less* than N1.
3. Do refinement by S. As result we got X`[0] and X`[1].
4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3 for them.
5. If some of X`[0] of X`[1] is empty set, then all functions in X are 
the same in context of trait S. We need another trait, go to #0. And 
repeat #0-#5 for all X` buckets we have.

Finally we got the groups of equivalent functions. And actually these 
groups would be organized in binary tree.
Most of stuff is implemented in std::map. I showed it as #0-#5 just to 
put it into context of your idea. The only problem was to introduce way 
of how to define order-relation (or how to define S). But this problem 
has been solved.
And on output we got groups of equal functions.

Fix me if I'm wrong. But it looks like every advantage given by 
partition refinement is present here.

As I said in beginning, it is way of how to commit things you proposed 
with minimum of changes. We will use old comparison routines, only thing 
we need to change is return type (from bool to -1,0,1).
In this way, partition refinement could be committed as series of small 
changes: one comparison routine per commit.

Every change in MergeFunctions from now could be well checked by this 
builder:
Now we have special builder for MergeFunctions. It runs test-suite with 
forced MergeFunctions pass:
http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2
Currently, MergeFunctions pass is stable, builder is green and passes 
all test-suite programs.

-Stepan.

Stepan Dyatkovskiy wrote:
> Hi all,
> Please find the updated patch in attachment:
> * Added some comments.
> * Fixed some typos.
>
> -Stepan
>
> Nick Lewycky wrote:
>> On 30 January 2014 01:27, Stepan Dyatkovskiy <stpworld at narod.ru
>> <mailto:stpworld at narod.ru>> wrote:
>>
>>     Hello Sean and Tobias,
>>
>>     Sean,
>>     Thank you. Could you describe Nick's ideas in few words or give me
>>     links to your discussion, so I could adapt my ideas to it.
>>
>>
>> Sorry I haven't had time to write it up. The essential idea is that we
>> use partition-refinement to determine the differences between a group of
>> functions instead of using pair-wise comparison.
>>
>> Imagine you want to compare long sequences of numbers. You can grab a
>> pair of sequences and run down them until they differ or you reach the
>> end, and move on to another pair of sequences. With partition refinement
>> you start with the first number in all the sequences and bucket them.
>> Say, three 1's, two 2's, and an 8. You've just formed 3 buckets. For
>> each bucket (with more than 1 member), do it again with the next number
>> in the sequence. This gets you an equivalence comparison at less than
>> the complexity of pairwise comparisons. (The buckets are partitions, and
>> you're refining them as you read more data. Partition refinement is the
>> dual of union-find, which we use quite heavily in the compiler.)
>>
>> I haven't figured out how this stacks up against Stepan's sorting with
>> <=> comparison. I can probably do that, I just haven't spent the time
>> thinking about it.
>>
>> I also haven't thought through how to integrate it with "near miss"
>> comparison. My current leading idea is that you do the same thing, but
>> make a record of what makes two buckets similar. (Note that the wrong
>> solution is to store a per-function-pair list of
>> similarities/differences.)
>>
>> Nick
>>
>>     Tobias,
>>     Your patch fails on several modules in my benchmark (73 of ~1800
>>     tests). I have sent one as attachment.
>>
>>     See statistics files for more details, all the .ll files you could
>>     simply find in test-suite object directory (after "make TEST=nightly
>>     report").
>>     I have attached script that launches benchmark. Example of usage
>>     (you need Linux OS):
>>     bash test-scipt.sh report.txt opt-no-patch opt-with-patch
>>
>>     Some numbers.
>>
>>     Total number of functions: 52715
>>
>>     Below, by default, I have excluded modules where your patch fails.
>>
>>     Functions merged
>>     Original version: 1113
>>     Order relation patch: 1112
>>     Similar merging patch: 541
>>
>>     Functions merged (including failed tests)
>>     Original version: 8398
>>     Order relation patch: 8395
>>
>>     Summary files size
>>     Initial:                                 163595634 bytes.
>>     After merging with order relation patch: 163147731 bytes.
>>     After similar merging patch:             162491905 bytes.
>>
>>     Summary files size (including failed tests)
>>     Initial:          250242266 bytes.
>>     Original version: 247168198 bytes.
>>     Order relation:   247175650 bytes.
>>
>>     Time. I measured with "time" utility, and used "real (wall clock)
>>     time used by the process".
>>
>>     Summary time spent
>>     With existing version:     28.05 secs.
>>     With order relation patch: 28.19 secs.
>>     Similar merging patch:     28.61 secs.
>>
>>     Summary time spent (including failed tests)
>>     With existing version:     41.74 secs.
>>     With order relation patch: 36.35 secs.
>>
>>     -Stepan
>>
>>     Sean Silva wrote:
>>
>>
>>
>>
>>         On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>         <tobias.von.koch at gmail.com <mailto:tobias.von.koch at gmail.com>
>>         <mailto:tobias.von.koch at gmail.__com
>>         <mailto:tobias.von.koch at gmail.com>>> wrote:
>>
>>              Hi Stepan,
>>
>>              Sorry for the delay. It's great that you are working on
>>              MergeFunctions as well and I agree, we should definitely
>> try to
>>              combine our efforts to improve MergeFunctions.
>>
>>              Just to give you some context, the pass (with the similar
>>         function
>>              merging patch) is already being used in a production
>>         setting. From
>>              my point of view, it would be better if we focus on
>>         improving its
>>              capability of merging functions at this stage rather than on
>>              reduction of compilation time.
>>
>>              I'll give a brief overview of how our similar function
>> merging
>>              algorithm works (you can also watch the presentation from
>>         the US
>>              LLVM conference to hear me explain it using an animation).
>>
>>              1. Hash all functions into buckets
>>
>>              In each bucket, separately:
>>
>>                2. Compare functions pair-wise and determine a
>>                   similarity metric for each pair (%age of equivalent
>>         instructions)
>>
>>                3. Merge identical functions (with similarity = 100%),
>>         update call
>>                   sites for those functions.
>>
>>                4. If the updates of call sites have touched other
>> functions,
>>                   go back to step 2 and re-compare *only* those
>>         functions to all
>>                   others in their buckets.
>>
>>                Finally,
>>
>>                5. Form groups of similar functions for merging:
>>                   a) Pick most similar pair (A,B)
>>                   b) Find all functions C that are also similar to B but
>>         are not
>>                      more similar to some other function D.
>>                   c) Merge A with B and all the C's.
>>                   Repeat until there are no more functions to merge.
>>
>>              As you can see, we need to compute a similarity measure for
>>         each
>>              pair of functions in a bucket. If I understand correctly,
>>         your patch
>>              reduces compile-time by avoiding comparisons of functions
>> that
>>              cannot be identical. That's exactly what you want to do if
>>         you only
>>              want to merge identical functions. However, because we also
>>         need to
>>              determine whether functions are merely *similar*, I can't
>>         see how
>>              your idea could be applied in that case.
>>
>>
>>         Yeah, the existing pass only tries to merge identical functions
>>         (identical in machine code). I'm wondering if the "similar"
>> function
>>         merging would be better as a separate pass (I'm genuinely
>>         wondering; I
>>         have no clue). I would wait for Nick's feedback.
>>
>>         -- Sean Silva
>>
>>
>>              Looking at your statistics, I see that you are considering
>>         a very
>>              large number of functions. I don't know which benchmarks
>>         you are
>>              using, but I doubt that many of these are worth merging. For
>>              instance, you rarely gain a code size benefit from merging
>>         functions
>>              with just a few instructions. Our patch takes care of this
>>         using
>>              thresholds. It's worth looking at actual code size
>> reductions,
>>              rather than just numbers of functions merged/ compilation
>>         time spent
>>              comparing functions. It turns out that the number of
>>         functions in
>>              each bucket is really quite small once you apply some
>>         heuristics
>>              (and could still be further reduced!).
>>
>>              Given your experience with MergeFunctions, it would be
>>         really great
>>              if you could review our patch and also try it out on your
>>         benchmarks.
>>
>>              Tobias
>>
>>
>>              On 24/01/2014 19:11, Stepan Dyatkovskiy wrote:
>>
>>                  Hi Tobias.
>>
>>                  So, what do you think?
>>
>>                  If it means to much extra-job for your team, may be I
>>         can help you
>>                  somehow? I really would like to.
>>
>>                  -Stepan
>>
>>                  Stepan Dyatkovskiy wrote:
>>
>>                      Hi Tobias,
>>
>>                          I can't really see a way to combine our
>>         approach with
>>                          your patch. What
>>                          are your thoughts?
>>
>>
>>                      I think it is possible. Even more - we need to
>>         combine our
>>                      efforts, in
>>                      order to bring this pass into real live.
>>                      I'have read your presentation file, and
>>         unfortunately read
>>                      your patch
>>                      only a little.
>>                      How exactly you scan functions on 2nd stage?
>> Could you
>>                      explain the
>>                      algorithm in few words, how you compare workflow?
>> Is it
>>                      possible to
>>                      scan binary tree instead of hash table? ...OK.
>>
>>                      That's how I see the modification. Now its
>> important to
>>                      explain idea,
>>                      so consider the simplest case: you need to catch two
>>                      functions that
>>                      are differs with single instruction somewhere.
>>
>>                      1. Imagine, that IR *module* contents is
>> represented as
>>                      binary tree:
>>                      Each line (trace) from root to terminal node is a
>>         function.
>>                      Each node - is function's primitive (instruction
>>         opcode, for
>>                      example).
>>                      Down - is direction to terminal nodes, up - is
>>         direction to
>>                      the root.
>>
>>                      2. Now you are standing on some node. And you
>> have two
>>                      directions
>>                      down-left and down-right.
>>
>>                      3. If you are able to find two equal sub-traces
>>         down (at
>>                      left and at
>>                      right), then the only difference lies in this node.
>>         Then we
>>                      catch that
>>                      case.
>>
>>                      4. Even more, if two subtrees at left and at
>> right are
>>                      equal, than you
>>                      catch at once all the cases that are represented by
>>         these
>>                      subtrees.
>>
>>                      I still didn't look at you patch carefully. Sorry..
>>         But I
>>                      hope that
>>                      helps, and I'll try look at it in nearest time and
>>         perhaps
>>                      its not the
>>                      best solution I gave in this post.
>>
>>                      -Stepan
>>
>>                      22.01.2014, 20:53, "Tobias von Koch"
>>                      <tobias.von.koch at gmail.com
>>         <mailto:tobias.von.koch at gmail.com>
>>         <mailto:tobias.von.koch at gmail.__com
>>         <mailto:tobias.von.koch at gmail.com>>>:
>>
>>
>>                          Hi Stepan,
>>
>>                          As you've seen we have recently implemented a
>>                          significant enhancement to
>>                          the MergeFunctions pass that also allows
>> merging of
>>                          functions that are
>>                          only similar but not identical
>>                          (http://llvm-reviews.__chandle__rc.com/D2591
>>         <http://chandlerc.com/D2591>
>>                          <http://llvm-reviews.__chandlerc.com/D2591
>>         <http://llvm-reviews.chandlerc.com/D2591>>).
>>
>>
>>                          Our patch also changes the way that functions
>> are
>>                          compared quite
>>                          significantly. This is necessary because we
>> need to
>>                          compare all
>>                          functions in a bucket in order to get a
>> similarity
>>                          measure for each
>>                          pair, so we can then decide which 'groups' of
>>         functions
>>                          to merge.
>>
>>                          I can't really see a way to combine our
>>         approach with
>>                          your patch. What
>>                          are your thoughts?
>>
>>                          Another way to improve the performance of
>>         MergeFunctions
>>                          might be to
>>                          make the hash function better. I've already
>>         added the
>>                          size of the first
>>                          BB to it, and perhaps there are other factors
>>         we could
>>                          try... if we
>>                          don't have to compare functions in the first
>> place
>>                          (because they're in
>>                          different buckets) then that's obviously a much
>>         bigger win.
>>
>>                          Thanks,
>>                          Tobias
>>
>>                          On 17/01/2014 20:25, Stepan Dyatkovskiy wrote:
>>
>>                                 Hi all,
>>
>>                                 I propose simple improvement for
>>         MergeFunctions
>>                              pass, that reduced
>>                              its
>>                                 complexity from O(N^2) to O(log(N)),
>>         where N is
>>                              number of
>>                              functions in
>>                                 module.
>>
>>                                 The idea, is to replace the result of
>>         comparison
>>                              from "bool" to
>>                                 "-1,0,1". In another words: define order
>>         relation
>>                              on functions set.
>>                                 To be sure, that functions could be
>>         comparable
>>                              that way, we have to
>>                                 prove that order relation is possible
>> on all
>>                              comparison stage.
>>
>>                                 The last one is possible, if for each
>>         comparison
>>                              stage we
>>                              implement has
>>                                 next properties:
>>                                 * reflexivity (a <= a, a == a, a >= a),
>>                                 * antisymmetry (if a <= b and b <= a
>>         then a  == b),
>>                                 * transitivity (a <= b and b <= c, then
>>         a <= c)
>>                                 * asymmetry (if a < b, then a > b or a
>>         == b).
>>
>>                                 Once we have defined order relation we
>>         can store
>>                              all the functions in
>>                                 binary tree and perform lookup in
>>         O(log(N)) time.
>>
>>                                 This post has two attachments:
>>                                 1. The patch, that has implementation of
>>         this idea.
>>                                 2. The MergeFunctions pass detailed
>>         description,
>>                              with explanation how
>>                                 order relation could be possible.
>>
>>                                 Hope it helps to make things better!
>>
>>                                 -Stepan.
>>
>>
>>           ___________________________________________________
>>                                 LLVM Developers mailing list
>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>         http://llvm.cs.uiuc.edu
>>         http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>
>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>
>>
>>
>>              ___________________________________________________
>>              LLVM Developers mailing list
>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>         <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>         http://llvm.cs.uiuc.edu
>>         http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>         <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>              <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>
>>
>>
>>
>>     _______________________________________________
>>     LLVM Developers mailing list
>>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>> http://llvm.cs.uiuc.edu
>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>