[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Stepan Dyatkovskiy
stpworld at narod.ru
Mon Feb 10 09:40:44 PST 2014
ping
Stepan Dyatkovskiy wrote:
> Hi all,
>
> Previous patch has been split onto series of small changes.
> On each stage (after each patch) MergeFunctions pass is compilable and
> stable.
>
> Please find patches in attachment for review.
>
> To apply all patches at once, use "apply-patches.sh" script as follows:
> 0. Place "apply-patches.sh" in same directory with patches.
> 1. cd <llvm-sources-dir>
> 2. "bash <path-to-patches-dir>/apply-patches.sh"
>
> -Stepan
>
> Stepan Dyatkovskiy wrote:
>> Hi Nick,
>> About partition refinement.
>>
>> It looks like patch proposed here allows to apply your idea with
>> painless way (as series of small changes actually).
>>
>> If I got it right, you would like to introduce the next
>> partition-refinement.
>>
>> Initial partition P consists of only one functions class X0 (whole
>> module, single bucket). Suppose S is subset of X0, so S represents
>> functions with some trait that is absent in rest part of X0.
>>
>> Then refinement looks like below:
>>
>> P`=refine(P, S):
>> X`[0] = intersection(X[0], S)
>> X`[1] = difference(X[0], S) // X0 without S0
>>
>> so P`={X`[0], X`[1]}
>>
>> And actually X`[0] and X`[1] are classes of functions that differs with
>> trait defined by S.
>>
>> Then (suppose, we need to introduce new trait, so it would be S`):
>>
>> P``=refine(P`, S`), so each X`` could be defined like below:
>> for each X`[i] in P`:
>> X``[i*2] = intersection(X`[i], S`)
>> X``[i*2+1] = difference(X`[i], S`)
>>
>> and so on, until all the traits are over.
>>
>> Finally we got the classes (set of Xs) that consists of equivalent
>> functions. So, we can merge functions inside the class bounds.
>>
>> Am I right?
>>
>> Patch, that was proposed here could be represented as such partition
>> refinement then:
>>
>> On each refinement stage, you need to introduce S somehow, the algorithm
>> that answers the question: "what sort of difference we want to check in
>> this bucket?"
>>
>> In existing implementation we check traits already, but pair-wise
>> (attributes, calling convention, number of arguments, number of basic
>> blocks, etc.). How to to use the same traits in partition refinement?
>>
>> Consider next example. Suppose we want to split some bucket X:
>>
>> 0. Take some trait from *pair-wise* comparison. For example: number of
>> arguments.
>> 1. Take some function from bucket X, let it be F1. Let F1 has N1 number
>> of arguments.
>> 2. Then S are functions with number of arguments *less* than N1.
>> 3. Do refinement by S. As result we got X`[0] and X`[1].
>> 4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3 for them.
>> 5. If some of X`[0] of X`[1] is empty set, then all functions in X are
>> the same in context of trait S. We need another trait, go to #0. And
>> repeat #0-#5 for all X` buckets we have.
>>
>> Finally we got the groups of equivalent functions. And actually these
>> groups would be organized in binary tree.
>> Most of stuff is implemented in std::map. I showed it as #0-#5 just to
>> put it into context of your idea. The only problem was to introduce way
>> of how to define order-relation (or how to define S). But this problem
>> has been solved.
>> And on output we got groups of equal functions.
>>
>> Fix me if I'm wrong. But it looks like every advantage given by
>> partition refinement is present here.
>>
>> As I said in beginning, it is way of how to commit things you proposed
>> with minimum of changes. We will use old comparison routines, only thing
>> we need to change is return type (from bool to -1,0,1).
>> In this way, partition refinement could be committed as series of small
>> changes: one comparison routine per commit.
>>
>> Every change in MergeFunctions from now could be well checked by this
>> builder:
>> Now we have special builder for MergeFunctions. It runs test-suite with
>> forced MergeFunctions pass:
>> http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2
>> Currently, MergeFunctions pass is stable, builder is green and passes
>> all test-suite programs.
>>
>> -Stepan.
>>
>> Stepan Dyatkovskiy wrote:
>>> Hi all,
>>> Please find the updated patch in attachment:
>>> * Added some comments.
>>> * Fixed some typos.
>>>
>>> -Stepan
>>>
>>> Nick Lewycky wrote:
>>>> On 30 January 2014 01:27, Stepan Dyatkovskiy <stpworld at narod.ru
>>>> <mailto:stpworld at narod.ru>> wrote:
>>>>
>>>> Hello Sean and Tobias,
>>>>
>>>> Sean,
>>>> Thank you. Could you describe Nick's ideas in few words or give me
>>>> links to your discussion, so I could adapt my ideas to it.
>>>>
>>>>
>>>> Sorry I haven't had time to write it up. The essential idea is that we
>>>> use partition-refinement to determine the differences between a
>>>> group of
>>>> functions instead of using pair-wise comparison.
>>>>
>>>> Imagine you want to compare long sequences of numbers. You can grab a
>>>> pair of sequences and run down them until they differ or you reach the
>>>> end, and move on to another pair of sequences. With partition
>>>> refinement
>>>> you start with the first number in all the sequences and bucket them.
>>>> Say, three 1's, two 2's, and an 8. You've just formed 3 buckets. For
>>>> each bucket (with more than 1 member), do it again with the next number
>>>> in the sequence. This gets you an equivalence comparison at less than
>>>> the complexity of pairwise comparisons. (The buckets are partitions,
>>>> and
>>>> you're refining them as you read more data. Partition refinement is the
>>>> dual of union-find, which we use quite heavily in the compiler.)
>>>>
>>>> I haven't figured out how this stacks up against Stepan's sorting with
>>>> <=> comparison. I can probably do that, I just haven't spent the time
>>>> thinking about it.
>>>>
>>>> I also haven't thought through how to integrate it with "near miss"
>>>> comparison. My current leading idea is that you do the same thing, but
>>>> make a record of what makes two buckets similar. (Note that the wrong
>>>> solution is to store a per-function-pair list of
>>>> similarities/differences.)
>>>>
>>>> Nick
>>>>
>>>> Tobias,
>>>> Your patch fails on several modules in my benchmark (73 of ~1800
>>>> tests). I have sent one as attachment.
>>>>
>>>> See statistics files for more details, all the .ll files you could
>>>> simply find in test-suite object directory (after "make
>>>> TEST=nightly
>>>> report").
>>>> I have attached script that launches benchmark. Example of usage
>>>> (you need Linux OS):
>>>> bash test-scipt.sh report.txt opt-no-patch opt-with-patch
>>>>
>>>> Some numbers.
>>>>
>>>> Total number of functions: 52715
>>>>
>>>> Below, by default, I have excluded modules where your patch fails.
>>>>
>>>> Functions merged
>>>> Original version: 1113
>>>> Order relation patch: 1112
>>>> Similar merging patch: 541
>>>>
>>>> Functions merged (including failed tests)
>>>> Original version: 8398
>>>> Order relation patch: 8395
>>>>
>>>> Summary files size
>>>> Initial: 163595634 bytes.
>>>> After merging with order relation patch: 163147731 bytes.
>>>> After similar merging patch: 162491905 bytes.
>>>>
>>>> Summary files size (including failed tests)
>>>> Initial: 250242266 bytes.
>>>> Original version: 247168198 bytes.
>>>> Order relation: 247175650 bytes.
>>>>
>>>> Time. I measured with "time" utility, and used "real (wall clock)
>>>> time used by the process".
>>>>
>>>> Summary time spent
>>>> With existing version: 28.05 secs.
>>>> With order relation patch: 28.19 secs.
>>>> Similar merging patch: 28.61 secs.
>>>>
>>>> Summary time spent (including failed tests)
>>>> With existing version: 41.74 secs.
>>>> With order relation patch: 36.35 secs.
>>>>
>>>> -Stepan
>>>>
>>>> Sean Silva wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>>> <tobias.von.koch at gmail.com <mailto:tobias.von.koch at gmail.com>
>>>> <mailto:tobias.von.koch at gmail.__com
>>>> <mailto:tobias.von.koch at gmail.com>>> wrote:
>>>>
>>>> Hi Stepan,
>>>>
>>>> Sorry for the delay. It's great that you are working on
>>>> MergeFunctions as well and I agree, we should definitely
>>>> try to
>>>> combine our efforts to improve MergeFunctions.
>>>>
>>>> Just to give you some context, the pass (with the similar
>>>> function
>>>> merging patch) is already being used in a production
>>>> setting. From
>>>> my point of view, it would be better if we focus on
>>>> improving its
>>>> capability of merging functions at this stage rather
>>>> than on
>>>> reduction of compilation time.
>>>>
>>>> I'll give a brief overview of how our similar function
>>>> merging
>>>> algorithm works (you can also watch the presentation from
>>>> the US
>>>> LLVM conference to hear me explain it using an animation).
>>>>
>>>> 1. Hash all functions into buckets
>>>>
>>>> In each bucket, separately:
>>>>
>>>> 2. Compare functions pair-wise and determine a
>>>> similarity metric for each pair (%age of equivalent
>>>> instructions)
>>>>
>>>> 3. Merge identical functions (with similarity = 100%),
>>>> update call
>>>> sites for those functions.
>>>>
>>>> 4. If the updates of call sites have touched other
>>>> functions,
>>>> go back to step 2 and re-compare *only* those
>>>> functions to all
>>>> others in their buckets.
>>>>
>>>> Finally,
>>>>
>>>> 5. Form groups of similar functions for merging:
>>>> a) Pick most similar pair (A,B)
>>>> b) Find all functions C that are also similar to B
>>>> but
>>>> are not
>>>> more similar to some other function D.
>>>> c) Merge A with B and all the C's.
>>>> Repeat until there are no more functions to merge.
>>>>
>>>> As you can see, we need to compute a similarity measure
>>>> for
>>>> each
>>>> pair of functions in a bucket. If I understand correctly,
>>>> your patch
>>>> reduces compile-time by avoiding comparisons of functions
>>>> that
>>>> cannot be identical. That's exactly what you want to do if
>>>> you only
>>>> want to merge identical functions. However, because we
>>>> also
>>>> need to
>>>> determine whether functions are merely *similar*, I can't
>>>> see how
>>>> your idea could be applied in that case.
>>>>
>>>>
>>>> Yeah, the existing pass only tries to merge identical functions
>>>> (identical in machine code). I'm wondering if the "similar"
>>>> function
>>>> merging would be better as a separate pass (I'm genuinely
>>>> wondering; I
>>>> have no clue). I would wait for Nick's feedback.
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>> Looking at your statistics, I see that you are considering
>>>> a very
>>>> large number of functions. I don't know which benchmarks
>>>> you are
>>>> using, but I doubt that many of these are worth merging.
>>>> For
>>>> instance, you rarely gain a code size benefit from merging
>>>> functions
>>>> with just a few instructions. Our patch takes care of this
>>>> using
>>>> thresholds. It's worth looking at actual code size
>>>> reductions,
>>>> rather than just numbers of functions merged/ compilation
>>>> time spent
>>>> comparing functions. It turns out that the number of
>>>> functions in
>>>> each bucket is really quite small once you apply some
>>>> heuristics
>>>> (and could still be further reduced!).
>>>>
>>>> Given your experience with MergeFunctions, it would be
>>>> really great
>>>> if you could review our patch and also try it out on your
>>>> benchmarks.
>>>>
>>>> Tobias
>>>>
>>>>
>>>> On 24/01/2014 19:11, Stepan Dyatkovskiy wrote:
>>>>
>>>> Hi Tobias.
>>>>
>>>> So, what do you think?
>>>>
>>>> If it means to much extra-job for your team, may be I
>>>> can help you
>>>> somehow? I really would like to.
>>>>
>>>> -Stepan
>>>>
>>>> Stepan Dyatkovskiy wrote:
>>>>
>>>> Hi Tobias,
>>>>
>>>> I can't really see a way to combine our
>>>> approach with
>>>> your patch. What
>>>> are your thoughts?
>>>>
>>>>
>>>> I think it is possible. Even more - we need to
>>>> combine our
>>>> efforts, in
>>>> order to bring this pass into real live.
>>>> I'have read your presentation file, and
>>>> unfortunately read
>>>> your patch
>>>> only a little.
>>>> How exactly you scan functions on 2nd stage?
>>>> Could you
>>>> explain the
>>>> algorithm in few words, how you compare workflow?
>>>> Is it
>>>> possible to
>>>> scan binary tree instead of hash table? ...OK.
>>>>
>>>> That's how I see the modification. Now its
>>>> important to
>>>> explain idea,
>>>> so consider the simplest case: you need to catch
>>>> two
>>>> functions that
>>>> are differs with single instruction somewhere.
>>>>
>>>> 1. Imagine, that IR *module* contents is
>>>> represented as
>>>> binary tree:
>>>> Each line (trace) from root to terminal node is a
>>>> function.
>>>> Each node - is function's primitive (instruction
>>>> opcode, for
>>>> example).
>>>> Down - is direction to terminal nodes, up - is
>>>> direction to
>>>> the root.
>>>>
>>>> 2. Now you are standing on some node. And you
>>>> have two
>>>> directions
>>>> down-left and down-right.
>>>>
>>>> 3. If you are able to find two equal sub-traces
>>>> down (at
>>>> left and at
>>>> right), then the only difference lies in this
>>>> node.
>>>> Then we
>>>> catch that
>>>> case.
>>>>
>>>> 4. Even more, if two subtrees at left and at
>>>> right are
>>>> equal, than you
>>>> catch at once all the cases that are
>>>> represented by
>>>> these
>>>> subtrees.
>>>>
>>>> I still didn't look at you patch carefully.
>>>> Sorry..
>>>> But I
>>>> hope that
>>>> helps, and I'll try look at it in nearest time and
>>>> perhaps
>>>> its not the
>>>> best solution I gave in this post.
>>>>
>>>> -Stepan
>>>>
>>>> 22.01.2014, 20:53, "Tobias von Koch"
>>>> <tobias.von.koch at gmail.com
>>>> <mailto:tobias.von.koch at gmail.com>
>>>> <mailto:tobias.von.koch at gmail.__com
>>>> <mailto:tobias.von.koch at gmail.com>>>:
>>>>
>>>>
>>>> Hi Stepan,
>>>>
>>>> As you've seen we have recently implemented a
>>>> significant enhancement to
>>>> the MergeFunctions pass that also allows
>>>> merging of
>>>> functions that are
>>>> only similar but not identical
>>>> (http://llvm-reviews.__chandle__rc.com/D2591
>>>> <http://chandlerc.com/D2591>
>>>> <http://llvm-reviews.__chandlerc.com/D2591
>>>> <http://llvm-reviews.chandlerc.com/D2591>>).
>>>>
>>>>
>>>> Our patch also changes the way that functions
>>>> are
>>>> compared quite
>>>> significantly. This is necessary because we
>>>> need to
>>>> compare all
>>>> functions in a bucket in order to get a
>>>> similarity
>>>> measure for each
>>>> pair, so we can then decide which 'groups' of
>>>> functions
>>>> to merge.
>>>>
>>>> I can't really see a way to combine our
>>>> approach with
>>>> your patch. What
>>>> are your thoughts?
>>>>
>>>> Another way to improve the performance of
>>>> MergeFunctions
>>>> might be to
>>>> make the hash function better. I've already
>>>> added the
>>>> size of the first
>>>> BB to it, and perhaps there are other factors
>>>> we could
>>>> try... if we
>>>> don't have to compare functions in the first
>>>> place
>>>> (because they're in
>>>> different buckets) then that's obviously a
>>>> much
>>>> bigger win.
>>>>
>>>> Thanks,
>>>> Tobias
>>>>
>>>> On 17/01/2014 20:25, Stepan Dyatkovskiy wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I propose simple improvement for
>>>> MergeFunctions
>>>> pass, that reduced
>>>> its
>>>> complexity from O(N^2) to O(log(N)),
>>>> where N is
>>>> number of
>>>> functions in
>>>> module.
>>>>
>>>> The idea, is to replace the result of
>>>> comparison
>>>> from "bool" to
>>>> "-1,0,1". In another words: define
>>>> order
>>>> relation
>>>> on functions set.
>>>> To be sure, that functions could be
>>>> comparable
>>>> that way, we have to
>>>> prove that order relation is possible
>>>> on all
>>>> comparison stage.
>>>>
>>>> The last one is possible, if for each
>>>> comparison
>>>> stage we
>>>> implement has
>>>> next properties:
>>>> * reflexivity (a <= a, a == a, a >= a),
>>>> * antisymmetry (if a <= b and b <= a
>>>> then a == b),
>>>> * transitivity (a <= b and b <= c, then
>>>> a <= c)
>>>> * asymmetry (if a < b, then a > b or a
>>>> == b).
>>>>
>>>> Once we have defined order relation we
>>>> can store
>>>> all the functions in
>>>> binary tree and perform lookup in
>>>> O(log(N)) time.
>>>>
>>>> This post has two attachments:
>>>> 1. The patch, that has
>>>> implementation of
>>>> this idea.
>>>> 2. The MergeFunctions pass detailed
>>>> description,
>>>> with explanation how
>>>> order relation could be possible.
>>>>
>>>> Hope it helps to make things better!
>>>>
>>>> -Stepan.
>>>>
>>>>
>>>> ___________________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>>> http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>>
>>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>
>>>>
>>>>
>>>> ___________________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>>> http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>> http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list