[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
    Stepan Dyatkovskiy 
    stpworld at narod.ru
       
    Fri Mar  7 09:47:10 PST 2014
    
    
  
Ping
> ping
> Stepan Dyatkovskiy wrote:
> 
>> Hi Nick,
>>
>> I tried to rework changes as you requested. One of patches (0004 with
>> extra assertions) has been removed.
>>
>>> + bool isEquivalentType(Type *Ty1, Type *Ty2) const {
>>> + return cmpType(Ty1, Ty2) == 0;
>>> + }
>>>
>>> Why do we still need isEquivalentType? Can we nuke this?
>>
>> Yup. After applying all the patches isEquivalentType will be totally
>> replaced with cmpType. All isEqXXXX and friends will be removed in 0011
>> (old 0012). No traces left.
>> Old function wasn't removed in 0001 just for keeping patches without
>> extra noise like:
>>
>> - something that uses isEquivalentType
>> + something that uses cmpType
>>
>> The point is, that "something" that uses isEquivalentType, will be also
>> replaced with one of next patches in this series.
>>
>>> +static int cmpNumbers(uint64_t L, uint64_t R) {
>>> + if (L < R) return -1;
>>> + if (L > R) return 1;
>>> + return 0;
>>> +}
>>>
>>> At a high level, you don't actually need a <=> operator to use a sort. A
>>> strict ordering ( < operator) is sufficient. (Note that for mergefunc, a
>>> strict weak ordering is not sufficient, it must be a total ordering.)
>>
>> That could be done with int FunctionComparator::compare(). We can
>> replace it with bool FunctionComparator::less(). Though for all other
>> cmp methods need at least 2 bits of information as result:
>> 1. Whether things are equal.
>> 2. Whether left less than right.
>>
>> As for FunctionComparator::compare(), conversion into less() will
>> increase time of sanity check (patch #0010).
>> Sanity check is just a sweet bonus. It checks that ordering implemented
>> properly (checks order relation properties).
>> Turning compare() into less() mean, that we'll have to run comparison
>> two times: L.less(R) and R.less(L). But may be sanity check is not a
>> thing to be published at all.
>>
>>> Consider hoisting this inside the FunctionComparator class? That class
>>> should have a bunch of implementations of comparisons between various
>>> different things, which can pass down to other methods in the same
>>
>> class.
>> In new patch series attached to this post, I have moved all static
>> methods into FunctionComparator.
>>
>>> + // Replacement for type::canLosslesslyBitCastTo, that
>>> + // establish order relation on this kind of properties.
>>> + int checkForLosslessBitcast(const Type *L, const Type *R);
>>>
>>> Type:: not type:: . Please make this comment more descriptive.
>>
>> Done.
>> [new comment]
>> Replacement for Type::canLosslesslyBitCastTo, that
>> establish order relation on this kind of properties
>> Returns 0, if L and R types could be converted to each other without
>> reinterpretation of bits.
>> Otherwise method returns -1 or 1, defining total ordering between
>> types in context of lossless bitcastability trait.
>> E.g.: if L is less than R (result is -1), than every type that could be
>> losslessly bitcasted to L is less than R.
>> [/new comment]
>>
>>> + /// Replacement for C1 == getBitCast(C2, C1Ty)
>>> + /// Its more controllable, and supposed to be simpler and more
>>> predictionable.
>>> + /// As very important advantage: it allows to introduce order
>>
>> relation on
>>
>>> + /// constants set, and thus use it as trait in refinement routines.
>>>
>>> "Its" --> "It's". "predictionable" --> "predictable". And how is it more
>>> predictable? I think this comment would be better if it described the
>>> function instead of making comparisons between it and other functions.
>>> Something like, "Compare constants under a system where pointer to X and
>>> pointer to Y are considered equal" or whatever is actually true here.
>>
>> Done.
>> [new comment]
>> Replacement for C1 == getBitCast(C2, C1Ty)
>> Parses constants contents, assuming that types are losslessly
>> bitcasted between each other. So actually it ignores types and only
>> compares bits from L and R.
>> Returns 0, if L and R has equivalent content.
>> -1 or 1 if values are different. Maintaining total ordering requires
>> two values that indicates non-equivalence (L less R, L greater R).
>> [/new comment]
>>
>>> +enum ConstantType {
>>> I'm not sure that this buys you much. All the "isa" tests can be broken
>>> down into a switch on getValueID() with the one exception of
>>
>> isNullValue().
>> Done.
>>
>>> + assert(
>>> + C1->getType()->canLosslesslyBitCastTo(C2->getType()) &&
>>> + "Pass is healthless. checkForLosslessBitcast should be twin of "
>>> + "canLosslesslyBitCastTo method, except the case when the last
>>
>> one "
>>
>>> + "returns false, the first one should return -1 or 1");
>>
>> ...
>>
>>> I think we can skip the asserts here. They aren't detecting a specific
>>> bug, they're checking whether the new code does a certain task relative
>>> to the old code. Drop the old code, your new code is the new sheriff in
>>> town.
>>
>> Asserts has been removed.
>>
>>> + DenseMap<const Value*, int> sn_map1, sn_map2;
>>>
>>> What is sn short for ("seen")? Why are there two of these?
>>
>> Serial number :-)
>>
>>> + std::pair<DenseMap<const Value *, int>::iterator, bool>
>>> + LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())),
>>> + RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size()));
>>>
>>> So I think I get it, this is easy to reason about. You number each value
>>> going down both functions. But I think you can eliminate one of these
>>> maps because you know that if the left and right were ever different
>>> then we terminate the comparison immediately. You can at least share one
>>> map with both V1 and V2, but I think you can reduce the number of map
>>> insertions.
>>
>> Not sure. Consider that in left you have:
>> %A = alloca i32
>> %B = alloca i32
>> store i32 1, i32* %A
>>
>> And in right:
>> %A = alloca i32
>> %B = alloca i32
>> store i32 1, i32* %B
>>
>> When we meet 'store' instruction we need to check in which order %A was
>> allocated at left and in which order %B was allocated at right.
>> So for each value in function we have to assign serial number. Then
>> comparing to local values from different functions we can just check
>> their serial numbers.
>> Though, may be I missed something? May be there is another way to
>> determine "who was first".
>>
>>> High-level question, are attributes ordered? Your comparison is ordered.
>>> So if I have function F with string attributes "sse2", "gc=hemisphere"
>>> and another function G with string attributes "gc=hemisphere", "sse2"
>>> then they would be considered different. I don't think that's right.
>>
>> I didn't check it. But if it is not ordered, we could order it
>> lexicographically.
>>
>>> + int cmpOperation(const Instruction *I1, const Instruction *I2) const;
>>> +
>>> bool isEquivalentOperation(const Instruction *I1,
>>> - const Instruction *I2) const;
>>> + const Instruction *I2) const {
>>> + return cmpOperation(I1, I2) == 0;
>>> + }
>>>
>>> Hopefully this patch series ultimately eliminates calls of
>>> isEquivalentOperation?
>>
>> Of course. Just like isEquivalentType.
>>
>>> By the way, this is an interesting one. Should "add x, y" and "add nsw
>>> x, y" be equal or not?
>>
>> Yeah. Those are interesting questions. We could even treat 'mul a, 2'
>> and 'shl a,1' as equal in some cases. I think it's matter of time and
>> further optimization.
>>
>> Please find first reworked patches in attachment.
>>
>> -Stepan
>>
>> Nick Lewycky wrote:
>>
>>> /// Compare two Types, treating all pointer types as equal.
>>> - bool isEquivalentType(Type *Ty1, Type *Ty2) const;
>>> + int cmpType(Type *Ty1, Type *Ty2) const;
>>> +
>>> + /// Compare two Types, treating all pointer types as equal.
>>> + bool isEquivalentType(Type *Ty1, Type *Ty2) const {
>>> + return cmpType(Ty1, Ty2) == 0;
>>> + }
>>>
>>> Why do we still need isEquivalentType? Can we nuke this?
>>>
>>> +static int cmpNumbers(uint64_t L, uint64_t R) {
>>> + if (L < R) return -1;
>>> + if (L > R) return 1;
>>> + return 0;
>>> +}
>>>
>>> At a high level, you don't actually need a <=> operator to use a sort. A
>>> strict ordering ( < operator) is sufficient. (Note that for mergefunc, a
>>> strict weak ordering is not sufficient, it must be a total ordering.)
>>>
>>> Consider hoisting this inside the FunctionComparator class? That class
>>> should have a bunch of implementations of comparisons between various
>>> different things, which can pass down to other methods in the same class.
>>>
>>> + // Replacement for type::canLosslesslyBitCastTo, that
>>> + // establish order relation on this kind of properties.
>>> + int checkForLosslessBitcast(const Type *L, const Type *R);
>>>
>>> Type:: not type:: . Please make this comment more descriptive.
>>>
>>> + /// Replacement for C1 == getBitCast(C2, C1Ty)
>>> + /// Its more controllable, and supposed to be simpler and more
>>> predictionable.
>>> + /// As very important advantage: it allows to introduce order
>>> relation on
>>> + /// constants set, and thus use it as trait in refinement routines.
>>>
>>> "Its" --> "It's". "predictionable" --> "predictable". And how is it more
>>> predictable? I think this comment would be better if it described the
>>> function instead of making comparisons between it and other functions.
>>> Something like, "Compare constants under a system where pointer to X and
>>> pointer to Y are considered equal" or whatever is actually true here.
>>>
>>> +enum ConstantType {
>>>
>>> I'm not sure that this buys you much. All the "isa" tests can be broken
>>> down into a switch on getValueID() with the one exception of
>>> isNullValue().
>>>
>>> + // Check whether C2 and C1 has equal bit patterns.
>>> + if (checkForLosslessBitcast(C1->getType(), C2->getType()) != 0)
>>> + return false;
>>> +
>>> + assert(
>>> + C1->getType()->canLosslesslyBitCastTo(C2->getType()) &&
>>> + "Pass is healthless. checkForLosslessBitcast should be twin of "
>>> + "canLosslesslyBitCastTo method, except the case when the last
>>> one "
>>> + "returns false, the first one should return -1 or 1");
>>> +
>>> + // Compare constant contents
>>> + if (cmpConstants(C1, C2) != 0)
>>> + return false;
>>> +
>>> + assert(
>>> + C1 ==
>>> + ConstantExpr::getBitCast(const_cast<Constant*>(C2),
>>> C1->getType()) &&
>>> + "Pass is healthless. cmpConstants should detect *subset* of "
>>> + "equivalent constants that could be detected by "
>>> + "getBitCast method. So whereas cmpConstants returns 0 "
>>> + "getBitCast also must return true. And wheres getBitCast "
>>> + "returns false, cmpConstants also must return -1 or 1.");
>>>
>>> I think we can skip the asserts here. They aren't detecting a specific
>>> bug, they're checking whether the new code does a certain task relative
>>> to the old code. Drop the old code, your new code is the new sheriff in
>>> town.
>>>
>>> + DenseMap<const Value*, int> sn_map1, sn_map2;
>>>
>>> What is sn short for ("seen")? Why are there two of these?
>>>
>>> + std::pair<DenseMap<const Value *, int>::iterator, bool>
>>> + LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())),
>>> + RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size()));
>>>
>>> So I think I get it, this is easy to reason about. You number each value
>>> going down both functions. But I think you can eliminate one of these
>>> maps because you know that if the left and right were ever different
>>> then we terminate the comparison immediately. You can at least share one
>>> map with both V1 and V2, but I think you can reduce the number of map
>>> insertions.
>>>
>>> +// Mostly cloned from BitcodeWriter, but simplified a bit.
>>>
>>> Missing indent.
>>>
>>> +static int cmpAttrs(const AttributeSet L, const AttributeSet R) {
>>>
>>> High-level question, are attributes ordered? Your comparison is ordered.
>>> So if I have function F with string attributes "sse2", "gc=hemisphere"
>>> and another function G with string attributes "gc=hemisphere", "sse2"
>>> then they would be considered different. I don't think that's right.
>>>
>>> + int cmpOperation(const Instruction *I1, const Instruction *I2) const;
>>> +
>>> bool isEquivalentOperation(const Instruction *I1,
>>> - const Instruction *I2) const;
>>> + const Instruction *I2) const {
>>> + return cmpOperation(I1, I2) == 0;
>>> + }
>>>
>>> Hopefully this patch series ultimately eliminates calls of
>>> isEquivalentOperation?
>>>
>>> + if (int Res = cmpNumbers(I1->getRawSubclassOptionalData(),
>>> + I2->getRawSubclassOptionalData()))
>>>
>>> By the way, this is an interesting one. Should "add x, y" and "add nsw
>>> x, y" be equal or not? Depends whether the backend uses it, and whether
>>> we're going to do more optimization afterwards (ie., this is the compile
>>> step of an LTO build). Nothing for you to change here, just want you to
>>> be aware that this is a point where we might want to have a flag and
>>> make mergefunc behave differently in different compile strategies.
>>>
>>> (I stopped before getting to 0008.)
>>>
>>> Nick
>>>
>>> On 3 February 2014 11:11, Stepan Dyatkovskiy <stpworld at narod.ru
>>> <mailto:stpworld at narod.ru>> wrote:
>>>
>>> Hi all,
>>>
>>> Previous patch has been split onto series of small changes.
>>> On each stage (after each patch) MergeFunctions pass is compilable
>>> and stable.
>>>
>>> Please find patches in attachment for review.
>>>
>>> To apply all patches at once, use "apply-patches.sh" script as
>>> follows:
>>> 0. Place "apply-patches.sh" in same directory with patches.
>>> 1. cd <llvm-sources-dir>
>>> 2. "bash <path-to-patches-dir>/apply-__patches.sh"
>>>
>>> -Stepan
>>>
>>> Stepan Dyatkovskiy wrote:
>>>
>>> Hi Nick,
>>> About partition refinement.
>>>
>>> It looks like patch proposed here allows to apply your idea with
>>> painless way (as series of small changes actually).
>>>
>>> If I got it right, you would like to introduce the next
>>> partition-refinement.
>>>
>>> Initial partition P consists of only one functions class X0
>>> (whole
>>> module, single bucket). Suppose S is subset of X0, so S
>>> represents
>>> functions with some trait that is absent in rest part of X0.
>>>
>>> Then refinement looks like below:
>>>
>>> P`=refine(P, S):
>>> X`[0] = intersection(X[0], S)
>>> X`[1] = difference(X[0], S) // X0 without S0
>>>
>>> so P`={X`[0], X`[1]}
>>>
>>> And actually X`[0] and X`[1] are classes of functions that
>>> differs with
>>> trait defined by S.
>>>
>>> Then (suppose, we need to introduce new trait, so it would be
>>> S`):
>>>
>>> P``=refine(P`, S`), so each X`` could be defined like below:
>>> for each X`[i] in P`:
>>> X``[i*2] = intersection(X`[i], S`)
>>> X``[i*2+1] = difference(X`[i], S`)
>>>
>>> and so on, until all the traits are over.
>>>
>>> Finally we got the classes (set of Xs) that consists of
>>> equivalent
>>> functions. So, we can merge functions inside the class bounds.
>>>
>>> Am I right?
>>>
>>> Patch, that was proposed here could be represented as such
>>> partition
>>> refinement then:
>>>
>>> On each refinement stage, you need to introduce S somehow, the
>>> algorithm
>>> that answers the question: "what sort of difference we want to
>>> check in
>>> this bucket?"
>>>
>>> In existing implementation we check traits already, but pair-wise
>>> (attributes, calling convention, number of arguments, number of
>>> basic
>>> blocks, etc.). How to to use the same traits in partition
>>> refinement?
>>>
>>> Consider next example. Suppose we want to split some bucket X:
>>>
>>> 0. Take some trait from *pair-wise* comparison. For example:
>>> number of
>>> arguments.
>>> 1. Take some function from bucket X, let it be F1. Let F1 has N1
>>> number
>>> of arguments.
>>> 2. Then S are functions with number of arguments *less* than N1.
>>> 3. Do refinement by S. As result we got X`[0] and X`[1].
>>> 4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3
>>> for them.
>>> 5. If some of X`[0] of X`[1] is empty set, then all functions in
>>> X are
>>> the same in context of trait S. We need another trait, go to
>>> #0. And
>>> repeat #0-#5 for all X` buckets we have.
>>>
>>> Finally we got the groups of equivalent functions. And actually
>>> these
>>> groups would be organized in binary tree.
>>> Most of stuff is implemented in std::map. I showed it as #0-#5
>>> just to
>>> put it into context of your idea. The only problem was to
>>> introduce way
>>> of how to define order-relation (or how to define S). But this
>>> problem
>>> has been solved.
>>> And on output we got groups of equal functions.
>>>
>>> Fix me if I'm wrong. But it looks like every advantage given by
>>> partition refinement is present here.
>>>
>>> As I said in beginning, it is way of how to commit things you
>>> proposed
>>> with minimum of changes. We will use old comparison routines,
>>> only thing
>>> we need to change is return type (from bool to -1,0,1).
>>> In this way, partition refinement could be committed as series
>>> of small
>>> changes: one comparison routine per commit.
>>>
>>> Every change in MergeFunctions from now could be well checked by
>>> this
>>> builder:
>>> Now we have special builder for MergeFunctions. It runs
>>> test-suite with
>>> forced MergeFunctions pass:
>>>
>>> http://lab.llvm.org:8011/__builders/clang-mergefunc-x86___64-freeBSD9.2
>>>
>>> <http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2>
>>> Currently, MergeFunctions pass is stable, builder is green and
>>> passes
>>> all test-suite programs.
>>>
>>> -Stepan.
>>>
>>> Stepan Dyatkovskiy wrote:
>>>
>>> Hi all,
>>> Please find the updated patch in attachment:
>>> * Added some comments.
>>> * Fixed some typos.
>>>
>>> -Stepan
>>>
>>> Nick Lewycky wrote:
>>>
>>> On 30 January 2014 01:27, Stepan Dyatkovskiy
>>> <stpworld at narod.ru <mailto:stpworld at narod.ru>
>>> <mailto:stpworld at narod.ru <mailto:stpworld at narod.ru>>>
>>> wrote:
>>>
>>> Hello Sean and Tobias,
>>>
>>> Sean,
>>> Thank you. Could you describe Nick's ideas in few
>>> words or give me
>>> links to your discussion, so I could adapt my ideas
>>> to it.
>>>
>>> Sorry I haven't had time to write it up. The essential
>>> idea is that we
>>> use partition-refinement to determine the differences
>>> between a group of
>>> functions instead of using pair-wise comparison.
>>>
>>> Imagine you want to compare long sequences of numbers.
>>> You can grab a
>>> pair of sequences and run down them until they differ or
>>> you reach the
>>> end, and move on to another pair of sequences. With
>>> partition refinement
>>> you start with the first number in all the sequences and
>>> bucket them.
>>> Say, three 1's, two 2's, and an 8. You've just formed 3
>>> buckets. For
>>> each bucket (with more than 1 member), do it again with
>>> the next number
>>> in the sequence. This gets you an equivalence comparison
>>> at less than
>>> the complexity of pairwise comparisons. (The buckets are
>>> partitions, and
>>> you're refining them as you read more data. Partition
>>> refinement is the
>>> dual of union-find, which we use quite heavily in the
>>> compiler.)
>>>
>>> I haven't figured out how this stacks up against
>>> Stepan's sorting with
>>> <=> comparison. I can probably do that, I just haven't
>>> spent the time
>>> thinking about it.
>>>
>>> I also haven't thought through how to integrate it with
>>> "near miss"
>>> comparison. My current leading idea is that you do the
>>> same thing, but
>>> make a record of what makes two buckets similar. (Note
>>> that the wrong
>>> solution is to store a per-function-pair list of
>>> similarities/differences.)
>>>
>>> Nick
>>>
>>> Tobias,
>>> Your patch fails on several modules in my benchmark
>>> (73 of ~1800
>>> tests). I have sent one as attachment.
>>>
>>> See statistics files for more details, all the .ll
>>> files you could
>>> simply find in test-suite object directory (after
>>> "make TEST=nightly
>>> report").
>>> I have attached script that launches benchmark.
>>> Example of usage
>>> (you need Linux OS):
>>> bash test-scipt.sh report.txt opt-no-patch
>>> opt-with-patch
>>>
>>> Some numbers.
>>>
>>> Total number of functions: 52715
>>>
>>> Below, by default, I have excluded modules where
>>> your patch fails.
>>>
>>> Functions merged
>>> Original version: 1113
>>> Order relation patch: 1112
>>> Similar merging patch: 541
>>>
>>> Functions merged (including failed tests)
>>> Original version: 8398
>>> Order relation patch: 8395
>>>
>>> Summary files size
>>> Initial: 163595634
>>> bytes.
>>> After merging with order relation patch: 163147731
>>> bytes.
>>> After similar merging patch: 162491905
>>> bytes.
>>>
>>> Summary files size (including failed tests)
>>> Initial: 250242266 bytes.
>>> Original version: 247168198 bytes.
>>> Order relation: 247175650 bytes.
>>>
>>> Time. I measured with "time" utility, and used
>>> "real (wall clock)
>>> time used by the process".
>>>
>>> Summary time spent
>>> With existing version: 28.05 secs.
>>> With order relation patch: 28.19 secs.
>>> Similar merging patch: 28.61 secs.
>>>
>>> Summary time spent (including failed tests)
>>> With existing version: 41.74 secs.
>>> With order relation patch: 36.35 secs.
>>>
>>> -Stepan
>>>
>>> Sean Silva wrote:
>>>
>>> On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>> <tobias.von.koch at gmail.com
>>> <mailto:tobias.von.koch at gmail.com>
>>> <mailto:tobias.von.koch at gmail.__com
>>> <mailto:tobias.von.koch at gmail.com>>
>>> <mailto:tobias.von.koch at gmail.
>>> <mailto:tobias.von.koch at gmail.>____com
>>> <mailto:tobias.von.koch at gmail.__com
>>> <mailto:tobias.von.koch at gmail.com>>>> wrote:
>>>
>>> Hi Stepan,
>>>
>>> Sorry for the delay. It's great that you
>>> are working on
>>> MergeFunctions as well and I agree, we
>>> should definitely
>>> try to
>>> combine our efforts to improve
>>> MergeFunctions.
>>>
>>> Just to give you some context, the pass
>>> (with the similar
>>> function
>>> merging patch) is already being used in a
>>> production
>>> setting. From
>>> my point of view, it would be better if we
>>> focus on
>>> improving its
>>> capability of merging functions at this
>>> stage rather
>>> than on
>>> reduction of compilation time.
>>>
>>> I'll give a brief overview of how our
>>> similar function
>>> merging
>>> algorithm works (you can also watch the
>>> presentation from
>>> the US
>>> LLVM conference to hear me explain it
>>> using an animation).
>>>
>>> 1. Hash all functions into buckets
>>>
>>> In each bucket, separately:
>>>
>>> 2. Compare functions pair-wise and
>>> determine a
>>> similarity metric for each pair (%age
>>> of equivalent
>>> instructions)
>>>
>>> 3. Merge identical functions (with
>>> similarity = 100%),
>>> update call
>>> sites for those functions.
>>>
>>> 4. If the updates of call sites have
>>> touched other
>>> functions,
>>> go back to step 2 and re-compare
>>> *only* those
>>> functions to all
>>> others in their buckets.
>>>
>>> Finally,
>>>
>>> 5. Form groups of similar functions for
>>> merging:
>>> a) Pick most similar pair (A,B)
>>> b) Find all functions C that are also
>>> similar to B but
>>> are not
>>> more similar to some other
>>> function D.
>>> c) Merge A with B and all the C's.
>>> Repeat until there are no more
>>> functions to merge.
>>>
>>> As you can see, we need to compute a
>>> similarity measure for
>>> each
>>> pair of functions in a bucket. If I
>>> understand correctly,
>>> your patch
>>> reduces compile-time by avoiding
>>> comparisons of functions
>>> that
>>> cannot be identical. That's exactly what
>>> you want to do if
>>> you only
>>> want to merge identical functions.
>>> However, because we also
>>> need to
>>> determine whether functions are merely
>>> *similar*, I can't
>>> see how
>>> your idea could be applied in that case.
>>>
>>> Yeah, the existing pass only tries to merge
>>> identical functions
>>> (identical in machine code). I'm wondering if
>>> the "similar"
>>> function
>>> merging would be better as a separate pass (I'm
>>> genuinely
>>> wondering; I
>>> have no clue). I would wait for Nick's feedback.
>>>
>>> -- Sean Silva
>>>
>>> Looking at your statistics, I see that you
>>> are considering
>>> a very
>>> large number of functions. I don't know
>>> which benchmarks
>>> you are
>>> using, but I doubt that many of these are
>>> worth merging.
>>> For
>>> instance, you rarely gain a code size
>>> benefit from merging
>>> functions
>>> with just a few instructions. Our patch
>>> takes care of this
>>> using
>>> thresholds. It's worth looking at actual
>>> code size
>>> reductions,
>>> rather than just numbers of functions
>>> merged/ compilation
>>> time spent
>>> comparing functions. It turns out that the
>>> number of
>>> functions in
>>> each bucket is really quite small once you
>>> apply some
>>> heuristics
>>> (and could still be further reduced!).
>>>
>>> Given your experience with MergeFunctions,
>>> it would be
>>> really great
>>> if you could review our patch and also try
>>> it out on your
>>> benchmarks.
>>>
>>> Tobias
>>>
>>> On 24/01/2014 19:11, Stepan Dyatkovskiy
>>> wrote:
>>>
>>> Hi Tobias.
>>>
>>> So, what do you think?
>>>
>>> If it means to much extra-job for your
>>> team, may be I
>>> can help you
>>> somehow? I really would like to.
>>>
>>> -Stepan
>>>
>>> Stepan Dyatkovskiy wrote:
>>>
>>> Hi Tobias,
>>>
>>> I can't really see a way to
>>> combine our
>>> approach with
>>> your patch. What
>>> are your thoughts?
>>>
>>> I think it is possible. Even more
>>> - we need to
>>> combine our
>>> efforts, in
>>> order to bring this pass into real
>>> live.
>>> I'have read your presentation
>>> file, and
>>> unfortunately read
>>> your patch
>>> only a little.
>>> How exactly you scan functions on
>>> 2nd stage?
>>> Could you
>>> explain the
>>> algorithm in few words, how you
>>> compare workflow?
>>> Is it
>>> possible to
>>> scan binary tree instead of hash
>>> table? ...OK.
>>>
>>> That's how I see the modification.
>>> Now its
>>> important to
>>> explain idea,
>>> so consider the simplest case: you
>>> need to catch
>>> two
>>> functions that
>>> are differs with single
>>> instruction somewhere.
>>>
>>> 1. Imagine, that IR *module*
>>> contents is
>>> represented as
>>> binary tree:
>>> Each line (trace) from root to
>>> terminal node is a
>>> function.
>>> Each node - is function's
>>> primitive (instruction
>>> opcode, for
>>> example).
>>> Down - is direction to terminal
>>> nodes, up - is
>>> direction to
>>> the root.
>>>
>>> 2. Now you are standing on some
>>> node. And you
>>> have two
>>> directions
>>> down-left and down-right.
>>>
>>> 3. If you are able to find two
>>> equal sub-traces
>>> down (at
>>> left and at
>>> right), then the only difference
>>> lies in this node.
>>> Then we
>>> catch that
>>> case.
>>>
>>> 4. Even more, if two subtrees at
>>> left and at
>>> right are
>>> equal, than you
>>> catch at once all the cases that
>>> are represented by
>>> these
>>> subtrees.
>>>
>>> I still didn't look at you patch
>>> carefully. Sorry..
>>> But I
>>> hope that
>>> helps, and I'll try look at it in
>>> nearest time and
>>> perhaps
>>> its not the
>>> best solution I gave in this post.
>>>
>>> -Stepan
>>>
>>> 22.01.2014, 20:53, "Tobias von
>>> Koch"
>>> <tobias.von.koch at gmail.com
>>> <mailto:tobias.von.koch at gmail.com>
>>> <mailto:tobias.von.koch at gmail.__com
>>> <mailto:tobias.von.koch at gmail.com>>
>>> <mailto:tobias.von.koch at gmail.
>>> <mailto:tobias.von.koch at gmail.>____com
>>> <mailto:tobias.von.koch at gmail.__com
>>> <mailto:tobias.von.koch at gmail.com>>>>:
>>>
>>> Hi Stepan,
>>>
>>> As you've seen we have
>>> recently implemented a
>>> significant enhancement to
>>> the MergeFunctions pass that
>>> also allows
>>> merging of
>>> functions that are
>>> only similar but not identical
>>>
>>> (http://llvm-reviews.__chandle____rc.com/D2591
>>> <http://chandle__rc.com/D2591>
>>> <http://chandlerc.com/D2591>
>>>
>>> <http://llvm-reviews.__chandle__rc.com/D2591
>>> <http://chandlerc.com/D2591>
>>> <http://llvm-reviews.__chandlerc.com/D2591
>>> <http://llvm-reviews.chandlerc.com/D2591>>>).
>>>
>>> Our patch also changes the way
>>> that functions
>>> are
>>> compared quite
>>> significantly. This is
>>> necessary because we
>>> need to
>>> compare all
>>> functions in a bucket in order
>>> to get a
>>> similarity
>>> measure for each
>>> pair, so we can then decide
>>> which 'groups' of
>>> functions
>>> to merge.
>>>
>>> I can't really see a way to
>>> combine our
>>> approach with
>>> your patch. What
>>> are your thoughts?
>>>
>>> Another way to improve the
>>> performance of
>>> MergeFunctions
>>> might be to
>>> make the hash function better.
>>> I've already
>>> added the
>>> size of the first
>>> BB to it, and perhaps there
>>> are other factors
>>> we could
>>> try... if we
>>> don't have to compare
>>> functions in the first
>>> place
>>> (because they're in
>>> different buckets) then that's
>>> obviously a much
>>> bigger win.
>>>
>>> Thanks,
>>> Tobias
>>>
>>> On 17/01/2014 20:25, Stepan
>>> Dyatkovskiy wrote:
>>>
>>> Hi all,
>>>
>>> I propose simple
>>> improvement for
>>> MergeFunctions
>>> pass, that reduced
>>> its
>>> complexity from O(N^2)
>>> to O(log(N)),
>>> where N is
>>> number of
>>> functions in
>>> module.
>>>
>>> The idea, is to replace
>>> the result of
>>> comparison
>>> from "bool" to
>>> "-1,0,1". In another
>>> words: define order
>>> relation
>>> on functions set.
>>> To be sure, that
>>> functions could be
>>> comparable
>>> that way, we have to
>>> prove that order
>>> relation is possible
>>> on all
>>> comparison stage.
>>>
>>> The last one is
>>> possible, if for each
>>> comparison
>>> stage we
>>> implement has
>>> next properties:
>>> * reflexivity (a <= a,
>>> a == a, a >= a),
>>> * antisymmetry (if a <=
>>> b and b <= a
>>> then a == b),
>>> * transitivity (a <= b
>>> and b <= c, then
>>> a <= c)
>>> * asymmetry (if a < b,
>>> then a > b or a
>>> == b).
>>>
>>> Once we have defined
>>> order relation we
>>> can store
>>> all the functions in
>>> binary tree and perform
>>> lookup in
>>> O(log(N)) time.
>>>
>>> This post has two
>>> attachments:
>>> 1. The patch, that has
>>> implementation of
>>> this idea.
>>> 2. The MergeFunctions
>>> pass detailed
>>> description,
>>> with explanation how
>>> order relation could be
>>> possible.
>>>
>>> Hope it helps to make
>>> things better!
>>>
>>> -Stepan.
>>>
>>> _____________________________________________________
>>> LLVM Developers mailing
>>> list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>> <mailto:LLVMdev at cs.uiuc.edu
>>> <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu
>>> <mailto:LLVMdev at cs.uiuc.edu>>>
>>> http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/______mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev>
>>>
>>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>>
>>>
>>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>
>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>>
>>>
>>> _____________________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>> <mailto:LLVMdev at cs.uiuc.edu
>>> <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu
>>> <mailto:LLVMdev at cs.uiuc.edu>>>
>>> http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/______mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev>
>>>
>>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>>
>>>
>>> <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>>
>>> <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>>
>>>
>>> _________________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>> <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>> http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>>
>>> _________________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>> http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>>
>>> _________________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>> http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
 -- 
Truly yours,
Stepan Dyatkovskiy
    
    
More information about the llvm-dev
mailing list