[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Wed Mar 5 23:51:28 PST 2014

ping
Stepan Dyatkovskiy wrote:
> Hi Nick,
>
> I tried to rework changes as you requested. One of patches (0004 with
> extra assertions) has been removed.
>
>  > +  bool isEquivalentType(Type *Ty1, Type *Ty2) const {
>  > +    return cmpType(Ty1, Ty2) == 0;
>  > +  }
>  >
>  > Why do we still need isEquivalentType? Can we nuke this?
> Yup. After applying all the patches isEquivalentType will be totally
> replaced with cmpType. All isEqXXXX and friends will be removed in 0011
> (old 0012). No traces left.
> Old function wasn't removed in 0001 just for keeping patches without
> extra noise like:
>
> - something that uses isEquivalentType
> + something that uses cmpType
>
> The point is, that "something" that uses isEquivalentType, will be also
> replaced with one of next patches in this series.
>
>  >
>  > +static int cmpNumbers(uint64_t L, uint64_t R) {
>  > +  if (L < R) return -1;
>  > +  if (L > R) return 1;
>  > +  return 0;
>  > +}
>  >
>  > At a high level, you don't actually need a <=> operator to use a sort. A
>  > strict ordering ( < operator) is sufficient. (Note that for mergefunc, a
>  > strict weak ordering is not sufficient, it must be a total ordering.)
> That could be done with int FunctionComparator::compare(). We can
> replace it with bool FunctionComparator::less(). Though for all other
> cmp methods need at least 2 bits of information as result:
> 1. Whether things are equal.
> 2. Whether left less than right.
>
> As for FunctionComparator::compare(), conversion into less() will
> increase time of sanity check (patch #0010).
> Sanity check is just a sweet bonus. It checks that ordering implemented
> properly (checks order relation properties).
> Turning compare() into less() mean, that we'll have to run comparison
> two times: L.less(R) and R.less(L). But may be sanity check is not a
> thing to be published at all.
>
>  >
>  > Consider hoisting this inside the FunctionComparator class? That class
>  > should have a bunch of implementations of comparisons between various
>  > different things, which can pass down to other methods in the same
> class.
> In new patch series attached to this post, I have moved all static
> methods into FunctionComparator.
>
>  > +  // Replacement for type::canLosslesslyBitCastTo, that
>  > +  // establish order relation on this kind of properties.
>  > +  int checkForLosslessBitcast(const Type *L, const Type *R);
>  >
>  > Type:: not type:: . Please make this comment more descriptive.
> Done.
> [new comment]
> Replacement for Type::canLosslesslyBitCastTo, that
> establish order relation on this kind of properties
> Returns 0, if L and R types could be converted to each other without
> reinterpretation of bits.
> Otherwise method returns -1 or 1, defining total ordering between
> types in context of lossless bitcastability trait.
> E.g.: if L is less than R (result is -1), than every type that could be
> losslessly bitcasted to L is less than R.
> [/new comment]
>
>  >
>  > +  /// Replacement for C1 == getBitCast(C2, C1Ty)
>  > +  /// Its more controllable, and supposed to be simpler and more
>  > predictionable.
>  > +  /// As very important advantage: it allows to introduce order
> relation on
>  > +  /// constants set, and thus use it as trait in refinement routines.
>  >
>  > "Its" --> "It's". "predictionable" --> "predictable". And how is it more
>  > predictable? I think this comment would be better if it described the
>  > function instead of making comparisons between it and other functions.
>  > Something like, "Compare constants under a system where pointer to X and
>  > pointer to Y are considered equal" or whatever is actually true here.
> Done.
> [new comment]
> Replacement for C1 == getBitCast(C2, C1Ty)
> Parses constants contents, assuming that types are losslessly
> bitcasted between each other. So actually it ignores types and only
> compares bits from L and R.
> Returns 0, if L and R has equivalent content.
> -1 or 1 if values are different. Maintaining total ordering requires
> two values that indicates non-equivalence (L less R, L greater R).
> [/new comment]
>
>  >
>  > +enum ConstantType {
>  > I'm not sure that this buys you much. All the "isa" tests can be broken
>  > down into a switch on getValueID() with the one exception of
> isNullValue().
> Done.
>
>  > +    assert(
>  > +      C1->getType()->canLosslesslyBitCastTo(C2->getType()) &&
>  > +      "Pass is healthless. checkForLosslessBitcast should be twin of "
>  > +      "canLosslesslyBitCastTo method, except the case when the last
> one "
>  > +      "returns false, the first one should return -1 or 1");
> ...
>  > I think we can skip the asserts here. They aren't detecting a specific
>  > bug, they're checking whether the new code does a certain task relative
>  > to the old code. Drop the old code, your new code is the new sheriff in
>  > town.
> Asserts has been removed.
>
>  >
>  > +  DenseMap<const Value*, int> sn_map1, sn_map2;
>  >
>  > What is sn short for ("seen")? Why are there two of these?
> Serial number :-)
>  >
>  > +  std::pair<DenseMap<const Value *, int>::iterator, bool>
>  > +    LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())),
>  > +    RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size()));
>  >
>  > So I think I get it, this is easy to reason about. You number each value
>  > going down both functions. But I think you can eliminate one of these
>  > maps because you know that if the left and right were ever different
>  > then we terminate the comparison immediately. You can at least share one
>  > map with both V1 and V2, but I think you can reduce the number of map
>  > insertions.
> Not sure. Consider that in left you have:
> %A = alloca i32
> %B = alloca i32
> store i32 1, i32* %A
>
> And in right:
> %A = alloca i32
> %B = alloca i32
> store i32 1, i32* %B
>
> When we meet 'store' instruction we need to check in which order %A was
> allocated at left and in which order %B was allocated at right.
> So for each value in function we have to assign serial number. Then
> comparing to local values from different functions we can just check
> their serial numbers.
> Though, may be I missed something? May be there is another way to
> determine "who was first".
>
>  > High-level question, are attributes ordered? Your comparison is ordered.
>  > So if I have function F with string attributes "sse2", "gc=hemisphere"
>  > and another function G with string attributes "gc=hemisphere", "sse2"
>  > then they would be considered different. I don't think that's right.
> I didn't check it. But if it is not ordered, we could order it
> lexicographically.
>
>  >
>  > +  int cmpOperation(const Instruction *I1, const Instruction *I2) const;
>  > +
>  >     bool isEquivalentOperation(const Instruction *I1,
>  > -                             const Instruction *I2) const;
>  > +                             const Instruction *I2) const {
>  > +    return cmpOperation(I1, I2) == 0;
>  > +  }
>  >
>  > Hopefully this patch series ultimately eliminates calls of
>  > isEquivalentOperation?
> Of course. Just like isEquivalentType.
>
>  > By the way, this is an interesting one. Should "add x, y" and "add nsw
>  > x, y" be equal or not?
> Yeah. Those are interesting questions. We could even treat 'mul a, 2'
> and 'shl a,1' as equal in some cases. I think it's matter of time and
> further optimization.
>
> Please find first reworked patches in attachment.
>
> -Stepan
>
> Nick Lewycky wrote:
>>     /// Compare two Types, treating all pointer types as equal.
>> -  bool isEquivalentType(Type *Ty1, Type *Ty2) const;
>> +  int cmpType(Type *Ty1, Type *Ty2) const;
>> +
>> +  /// Compare two Types, treating all pointer types as equal.
>> +  bool isEquivalentType(Type *Ty1, Type *Ty2) const {
>> +    return cmpType(Ty1, Ty2) == 0;
>> +  }
>>
>> Why do we still need isEquivalentType? Can we nuke this?
>>
>> +static int cmpNumbers(uint64_t L, uint64_t R) {
>> +  if (L < R) return -1;
>> +  if (L > R) return 1;
>> +  return 0;
>> +}
>>
>> At a high level, you don't actually need a <=> operator to use a sort. A
>> strict ordering ( < operator) is sufficient. (Note that for mergefunc, a
>> strict weak ordering is not sufficient, it must be a total ordering.)
>>
>> Consider hoisting this inside the FunctionComparator class? That class
>> should have a bunch of implementations of comparisons between various
>> different things, which can pass down to other methods in the same class.
>>
>> +  // Replacement for type::canLosslesslyBitCastTo, that
>> +  // establish order relation on this kind of properties.
>> +  int checkForLosslessBitcast(const Type *L, const Type *R);
>>
>> Type:: not type:: . Please make this comment more descriptive.
>>
>> +  /// Replacement for C1 == getBitCast(C2, C1Ty)
>> +  /// Its more controllable, and supposed to be simpler and more
>> predictionable.
>> +  /// As very important advantage: it allows to introduce order
>> relation on
>> +  /// constants set, and thus use it as trait in refinement routines.
>>
>> "Its" --> "It's". "predictionable" --> "predictable". And how is it more
>> predictable? I think this comment would be better if it described the
>> function instead of making comparisons between it and other functions.
>> Something like, "Compare constants under a system where pointer to X and
>> pointer to Y are considered equal" or whatever is actually true here.
>>
>> +enum ConstantType {
>>
>> I'm not sure that this buys you much. All the "isa" tests can be broken
>> down into a switch on getValueID() with the one exception of
>> isNullValue().
>>
>> +    // Check whether C2 and C1 has equal bit patterns.
>> +    if (checkForLosslessBitcast(C1->getType(), C2->getType()) != 0)
>> +      return false;
>> +
>> +    assert(
>> +      C1->getType()->canLosslesslyBitCastTo(C2->getType()) &&
>> +      "Pass is healthless. checkForLosslessBitcast should be twin of "
>> +      "canLosslesslyBitCastTo method, except the case when the last
>> one "
>> +      "returns false, the first one should return -1 or 1");
>> +
>> +    // Compare constant contents
>> +    if (cmpConstants(C1, C2) != 0)
>> +      return false;
>> +
>> +    assert(
>> +      C1 ==
>> +      ConstantExpr::getBitCast(const_cast<Constant*>(C2),
>> C1->getType()) &&
>> +      "Pass is healthless. cmpConstants should detect *subset* of "
>> +      "equivalent constants that could be detected by "
>> +      "getBitCast method. So whereas cmpConstants returns 0 "
>> +      "getBitCast also must return true. And wheres getBitCast "
>> +      "returns false, cmpConstants also must return -1 or 1.");
>>
>> I think we can skip the asserts here. They aren't detecting a specific
>> bug, they're checking whether the new code does a certain task relative
>> to the old code. Drop the old code, your new code is the new sheriff in
>> town.
>>
>> +  DenseMap<const Value*, int> sn_map1, sn_map2;
>>
>> What is sn short for ("seen")? Why are there two of these?
>>
>> +  std::pair<DenseMap<const Value *, int>::iterator, bool>
>> +    LeftSN = sn_map1.insert(std::make_pair(V1, sn_map1.size())),
>> +    RightSN = sn_map2.insert(std::make_pair(V2, sn_map2.size()));
>>
>> So I think I get it, this is easy to reason about. You number each value
>> going down both functions. But I think you can eliminate one of these
>> maps because you know that if the left and right were ever different
>> then we terminate the comparison immediately. You can at least share one
>> map with both V1 and V2, but I think you can reduce the number of map
>> insertions.
>>
>> +// Mostly cloned from BitcodeWriter, but simplified a bit.
>>
>> Missing indent.
>>
>> +static int cmpAttrs(const AttributeSet L, const AttributeSet R) {
>>
>> High-level question, are attributes ordered? Your comparison is ordered.
>> So if I have function F with string attributes "sse2", "gc=hemisphere"
>> and another function G with string attributes "gc=hemisphere", "sse2"
>> then they would be considered different. I don't think that's right.
>>
>> +  int cmpOperation(const Instruction *I1, const Instruction *I2) const;
>> +
>>     bool isEquivalentOperation(const Instruction *I1,
>> -                             const Instruction *I2) const;
>> +                             const Instruction *I2) const {
>> +    return cmpOperation(I1, I2) == 0;
>> +  }
>>
>> Hopefully this patch series ultimately eliminates calls of
>> isEquivalentOperation?
>>
>> +  if (int Res = cmpNumbers(I1->getRawSubclassOptionalData(),
>> +                           I2->getRawSubclassOptionalData()))
>>
>> By the way, this is an interesting one. Should "add x, y" and "add nsw
>> x, y" be equal or not? Depends whether the backend uses it, and whether
>> we're going to do more optimization afterwards (ie., this is the compile
>> step of an LTO build). Nothing for you to change here, just want you to
>> be aware that this is a point where we might want to have a flag and
>> make mergefunc behave differently in different compile strategies.
>>
>> (I stopped before getting to 0008.)
>>
>> Nick
>>
>>
>> On 3 February 2014 11:11, Stepan Dyatkovskiy <stpworld at narod.ru
>> <mailto:stpworld at narod.ru>> wrote:
>>
>>     Hi all,
>>
>>     Previous patch has been split onto series of small changes.
>>     On each stage (after each patch) MergeFunctions pass is compilable
>>     and stable.
>>
>>     Please find patches in attachment for review.
>>
>>     To apply all patches at once, use "apply-patches.sh" script as
>> follows:
>>     0. Place "apply-patches.sh" in same directory with patches.
>>     1. cd <llvm-sources-dir>
>>     2. "bash <path-to-patches-dir>/apply-__patches.sh"
>>
>>     -Stepan
>>
>>     Stepan Dyatkovskiy wrote:
>>
>>         Hi Nick,
>>         About partition refinement.
>>
>>
>>         It looks like patch proposed here allows to apply your idea with
>>         painless way (as series of small changes actually).
>>
>>         If I got it right, you would like to introduce the next
>>         partition-refinement.
>>
>>         Initial partition P consists of only one functions class X0
>> (whole
>>         module, single bucket). Suppose S is subset of X0, so S
>> represents
>>         functions with some trait that is absent in rest part of X0.
>>
>>         Then refinement looks like below:
>>
>>         P`=refine(P, S):
>>             X`[0] = intersection(X[0], S)
>>             X`[1] = difference(X[0], S)  // X0 without S0
>>
>>         so P`={X`[0], X`[1]}
>>
>>         And actually X`[0] and X`[1] are classes of functions that
>>         differs with
>>         trait defined by S.
>>
>>         Then (suppose, we need to introduce new trait, so it would be
>> S`):
>>
>>         P``=refine(P`, S`), so each X`` could be defined like below:
>>         for each X`[i] in P`:
>>             X``[i*2] = intersection(X`[i], S`)
>>             X``[i*2+1] = difference(X`[i], S`)
>>
>>         and so on, until all the traits are over.
>>
>>         Finally we got the classes (set of Xs) that consists of
>> equivalent
>>         functions. So, we can merge functions inside the class bounds.
>>
>>         Am I right?
>>
>>         Patch, that was proposed here could be represented as such
>> partition
>>         refinement then:
>>
>>         On each refinement stage, you need to introduce S somehow, the
>>         algorithm
>>         that answers the question: "what sort of difference we want to
>>         check in
>>         this bucket?"
>>
>>         In existing implementation we check traits already, but pair-wise
>>         (attributes, calling convention, number of arguments, number of
>>         basic
>>         blocks, etc.). How to to use the same traits in partition
>>         refinement?
>>
>>         Consider next example. Suppose we want to split some bucket X:
>>
>>         0. Take some trait from *pair-wise* comparison. For example:
>>         number of
>>         arguments.
>>         1. Take some function from bucket X, let it be F1. Let F1 has N1
>>         number
>>         of arguments.
>>         2. Then S are functions with number of arguments *less* than N1.
>>         3. Do refinement by S. As result we got X`[0] and X`[1].
>>         4. If both of X`[0] and X`[1] are not empty, then repeat #1-#3
>>         for them.
>>         5. If some of X`[0] of X`[1] is empty set, then all functions in
>>         X are
>>         the same in context of trait S. We need another trait, go to
>> #0. And
>>         repeat #0-#5 for all X` buckets we have.
>>
>>         Finally we got the groups of equivalent functions. And actually
>>         these
>>         groups would be organized in binary tree.
>>         Most of stuff is implemented in std::map. I showed it as #0-#5
>>         just to
>>         put it into context of your idea. The only problem was to
>>         introduce way
>>         of how to define order-relation (or how to define S). But this
>>         problem
>>         has been solved.
>>         And on output we got groups of equal functions.
>>
>>         Fix me if I'm wrong. But it looks like every advantage given by
>>         partition refinement is present here.
>>
>>         As I said in beginning, it is way of how to commit things you
>>         proposed
>>         with minimum of changes. We will use old comparison routines,
>>         only thing
>>         we need to change is return type (from bool to -1,0,1).
>>         In this way, partition refinement could be committed as series
>>         of small
>>         changes: one comparison routine per commit.
>>
>>         Every change in MergeFunctions from now could be well checked by
>>         this
>>         builder:
>>         Now we have special builder for MergeFunctions. It runs
>>         test-suite with
>>         forced MergeFunctions pass:
>>
>> http://lab.llvm.org:8011/__builders/clang-mergefunc-x86___64-freeBSD9.2
>>
>> <http://lab.llvm.org:8011/builders/clang-mergefunc-x86_64-freeBSD9.2>
>>         Currently, MergeFunctions pass is stable, builder is green and
>>         passes
>>         all test-suite programs.
>>
>>         -Stepan.
>>
>>         Stepan Dyatkovskiy wrote:
>>
>>             Hi all,
>>             Please find the updated patch in attachment:
>>             * Added some comments.
>>             * Fixed some typos.
>>
>>             -Stepan
>>
>>             Nick Lewycky wrote:
>>
>>                 On 30 January 2014 01:27, Stepan Dyatkovskiy
>>                 <stpworld at narod.ru <mailto:stpworld at narod.ru>
>>                 <mailto:stpworld at narod.ru <mailto:stpworld at narod.ru>>>
>>                 wrote:
>>
>>                      Hello Sean and Tobias,
>>
>>                      Sean,
>>                      Thank you. Could you describe Nick's ideas in few
>>                 words or give me
>>                      links to your discussion, so I could adapt my ideas
>>                 to it.
>>
>>
>>                 Sorry I haven't had time to write it up. The essential
>>                 idea is that we
>>                 use partition-refinement to determine the differences
>>                 between a group of
>>                 functions instead of using pair-wise comparison.
>>
>>                 Imagine you want to compare long sequences of numbers.
>>                 You can grab a
>>                 pair of sequences and run down them until they differ or
>>                 you reach the
>>                 end, and move on to another pair of sequences. With
>>                 partition refinement
>>                 you start with the first number in all the sequences and
>>                 bucket them.
>>                 Say, three 1's, two 2's, and an 8. You've just formed 3
>>                 buckets. For
>>                 each bucket (with more than 1 member), do it again with
>>                 the next number
>>                 in the sequence. This gets you an equivalence comparison
>>                 at less than
>>                 the complexity of pairwise comparisons. (The buckets are
>>                 partitions, and
>>                 you're refining them as you read more data. Partition
>>                 refinement is the
>>                 dual of union-find, which we use quite heavily in the
>>                 compiler.)
>>
>>                 I haven't figured out how this stacks up against
>>                 Stepan's sorting with
>>                 <=> comparison. I can probably do that, I just haven't
>>                 spent the time
>>                 thinking about it.
>>
>>                 I also haven't thought through how to integrate it with
>>                 "near miss"
>>                 comparison. My current leading idea is that you do the
>>                 same thing, but
>>                 make a record of what makes two buckets similar. (Note
>>                 that the wrong
>>                 solution is to store a per-function-pair list of
>>                 similarities/differences.)
>>
>>                 Nick
>>
>>                      Tobias,
>>                      Your patch fails on several modules in my benchmark
>>                 (73 of ~1800
>>                      tests). I have sent one as attachment.
>>
>>                      See statistics files for more details, all the .ll
>>                 files you could
>>                      simply find in test-suite object directory (after
>>                 "make TEST=nightly
>>                      report").
>>                      I have attached script that launches benchmark.
>>                 Example of usage
>>                      (you need Linux OS):
>>                      bash test-scipt.sh report.txt opt-no-patch
>>                 opt-with-patch
>>
>>                      Some numbers.
>>
>>                      Total number of functions: 52715
>>
>>                      Below, by default, I have excluded modules where
>>                 your patch fails.
>>
>>                      Functions merged
>>                      Original version: 1113
>>                      Order relation patch: 1112
>>                      Similar merging patch: 541
>>
>>                      Functions merged (including failed tests)
>>                      Original version: 8398
>>                      Order relation patch: 8395
>>
>>                      Summary files size
>>                      Initial:                                 163595634
>>                 bytes.
>>                      After merging with order relation patch: 163147731
>>                 bytes.
>>                      After similar merging patch:             162491905
>>                 bytes.
>>
>>                      Summary files size (including failed tests)
>>                      Initial:          250242266 bytes.
>>                      Original version: 247168198 bytes.
>>                      Order relation:   247175650 bytes.
>>
>>                      Time. I measured with "time" utility, and used
>>                 "real (wall clock)
>>                      time used by the process".
>>
>>                      Summary time spent
>>                      With existing version:     28.05 secs.
>>                      With order relation patch: 28.19 secs.
>>                      Similar merging patch:     28.61 secs.
>>
>>                      Summary time spent (including failed tests)
>>                      With existing version:     41.74 secs.
>>                      With order relation patch: 36.35 secs.
>>
>>                      -Stepan
>>
>>                      Sean Silva wrote:
>>
>>
>>
>>
>>                          On Tue, Jan 28, 2014 at 2:47 PM, Tobias von Koch
>>                          <tobias.von.koch at gmail.com
>>                 <mailto:tobias.von.koch at gmail.com>
>>                 <mailto:tobias.von.koch at gmail.__com
>>                 <mailto:tobias.von.koch at gmail.com>>
>>                          <mailto:tobias.von.koch at gmail.
>>                 <mailto:tobias.von.koch at gmail.>____com
>>                          <mailto:tobias.von.koch at gmail.__com
>>                 <mailto:tobias.von.koch at gmail.com>>>> wrote:
>>
>>                               Hi Stepan,
>>
>>                               Sorry for the delay. It's great that you
>>                 are working on
>>                               MergeFunctions as well and I agree, we
>>                 should definitely
>>                 try to
>>                               combine our efforts to improve
>> MergeFunctions.
>>
>>                               Just to give you some context, the pass
>>                 (with the similar
>>                          function
>>                               merging patch) is already being used in a
>>                 production
>>                          setting. From
>>                               my point of view, it would be better if we
>>                 focus on
>>                          improving its
>>                               capability of merging functions at this
>>                 stage rather
>>                 than on
>>                               reduction of compilation time.
>>
>>                               I'll give a brief overview of how our
>>                 similar function
>>                 merging
>>                               algorithm works (you can also watch the
>>                 presentation from
>>                          the US
>>                               LLVM conference to hear me explain it
>>                 using an animation).
>>
>>                               1. Hash all functions into buckets
>>
>>                               In each bucket, separately:
>>
>>                                 2. Compare functions pair-wise and
>>                 determine a
>>                                    similarity metric for each pair (%age
>>                 of equivalent
>>                          instructions)
>>
>>                                 3. Merge identical functions (with
>>                 similarity = 100%),
>>                          update call
>>                                    sites for those functions.
>>
>>                                 4. If the updates of call sites have
>>                 touched other
>>                 functions,
>>                                    go back to step 2 and re-compare
>>                 *only* those
>>                          functions to all
>>                                    others in their buckets.
>>
>>                                 Finally,
>>
>>                                 5. Form groups of similar functions for
>>                 merging:
>>                                    a) Pick most similar pair (A,B)
>>                                    b) Find all functions C that are also
>>                 similar to B but
>>                          are not
>>                                       more similar to some other
>> function D.
>>                                    c) Merge A with B and all the C's.
>>                                    Repeat until there are no more
>>                 functions to merge.
>>
>>                               As you can see, we need to compute a
>>                 similarity measure for
>>                          each
>>                               pair of functions in a bucket. If I
>>                 understand correctly,
>>                          your patch
>>                               reduces compile-time by avoiding
>>                 comparisons of functions
>>                 that
>>                               cannot be identical. That's exactly what
>>                 you want to do if
>>                          you only
>>                               want to merge identical functions.
>>                 However, because we also
>>                          need to
>>                               determine whether functions are merely
>>                 *similar*, I can't
>>                          see how
>>                               your idea could be applied in that case.
>>
>>
>>                          Yeah, the existing pass only tries to merge
>>                 identical functions
>>                          (identical in machine code). I'm wondering if
>>                 the "similar"
>>                 function
>>                          merging would be better as a separate pass (I'm
>>                 genuinely
>>                          wondering; I
>>                          have no clue). I would wait for Nick's feedback.
>>
>>                          -- Sean Silva
>>
>>
>>                               Looking at your statistics, I see that you
>>                 are considering
>>                          a very
>>                               large number of functions. I don't know
>>                 which benchmarks
>>                          you are
>>                               using, but I doubt that many of these are
>>                 worth merging.
>>                 For
>>                               instance, you rarely gain a code size
>>                 benefit from merging
>>                          functions
>>                               with just a few instructions. Our patch
>>                 takes care of this
>>                          using
>>                               thresholds. It's worth looking at actual
>>                 code size
>>                 reductions,
>>                               rather than just numbers of functions
>>                 merged/ compilation
>>                          time spent
>>                               comparing functions. It turns out that the
>>                 number of
>>                          functions in
>>                               each bucket is really quite small once you
>>                 apply some
>>                          heuristics
>>                               (and could still be further reduced!).
>>
>>                               Given your experience with MergeFunctions,
>>                 it would be
>>                          really great
>>                               if you could review our patch and also try
>>                 it out on your
>>                          benchmarks.
>>
>>                               Tobias
>>
>>
>>                               On 24/01/2014 19:11, Stepan Dyatkovskiy
>> wrote:
>>
>>                                   Hi Tobias.
>>
>>                                   So, what do you think?
>>
>>                                   If it means to much extra-job for your
>>                 team, may be I
>>                          can help you
>>                                   somehow? I really would like to.
>>
>>                                   -Stepan
>>
>>                                   Stepan Dyatkovskiy wrote:
>>
>>                                       Hi Tobias,
>>
>>                                           I can't really see a way to
>>                 combine our
>>                          approach with
>>                                           your patch. What
>>                                           are your thoughts?
>>
>>
>>                                       I think it is possible. Even more
>>                 - we need to
>>                          combine our
>>                                       efforts, in
>>                                       order to bring this pass into real
>>                 live.
>>                                       I'have read your presentation
>>                 file, and
>>                          unfortunately read
>>                                       your patch
>>                                       only a little.
>>                                       How exactly you scan functions on
>>                 2nd stage?
>>                 Could you
>>                                       explain the
>>                                       algorithm in few words, how you
>>                 compare workflow?
>>                 Is it
>>                                       possible to
>>                                       scan binary tree instead of hash
>>                 table? ...OK.
>>
>>                                       That's how I see the modification.
>>                 Now its
>>                 important to
>>                                       explain idea,
>>                                       so consider the simplest case: you
>>                 need to catch
>>                 two
>>                                       functions that
>>                                       are differs with single
>>                 instruction somewhere.
>>
>>                                       1. Imagine, that IR *module*
>>                 contents is
>>                 represented as
>>                                       binary tree:
>>                                       Each line (trace) from root to
>>                 terminal node is a
>>                          function.
>>                                       Each node - is function's
>>                 primitive (instruction
>>                          opcode, for
>>                                       example).
>>                                       Down - is direction to terminal
>>                 nodes, up - is
>>                          direction to
>>                                       the root.
>>
>>                                       2. Now you are standing on some
>>                 node. And you
>>                 have two
>>                                       directions
>>                                       down-left and down-right.
>>
>>                                       3. If you are able to find two
>>                 equal sub-traces
>>                          down (at
>>                                       left and at
>>                                       right), then the only difference
>>                 lies in this node.
>>                          Then we
>>                                       catch that
>>                                       case.
>>
>>                                       4. Even more, if two subtrees at
>>                 left and at
>>                 right are
>>                                       equal, than you
>>                                       catch at once all the cases that
>>                 are represented by
>>                          these
>>                                       subtrees.
>>
>>                                       I still didn't look at you patch
>>                 carefully. Sorry..
>>                          But I
>>                                       hope that
>>                                       helps, and I'll try look at it in
>>                 nearest time and
>>                          perhaps
>>                                       its not the
>>                                       best solution I gave in this post.
>>
>>                                       -Stepan
>>
>>                                       22.01.2014, 20:53, "Tobias von
>> Koch"
>>                                       <tobias.von.koch at gmail.com
>>                 <mailto:tobias.von.koch at gmail.com>
>>                          <mailto:tobias.von.koch at gmail.__com
>>                 <mailto:tobias.von.koch at gmail.com>>
>>                          <mailto:tobias.von.koch at gmail.
>>                 <mailto:tobias.von.koch at gmail.>____com
>>                          <mailto:tobias.von.koch at gmail.__com
>>                 <mailto:tobias.von.koch at gmail.com>>>>:
>>
>>
>>                                           Hi Stepan,
>>
>>                                           As you've seen we have
>>                 recently implemented a
>>                                           significant enhancement to
>>                                           the MergeFunctions pass that
>>                 also allows
>>                 merging of
>>                                           functions that are
>>                                           only similar but not identical
>>
>>                   (http://llvm-reviews.__chandle____rc.com/D2591
>>                 <http://chandle__rc.com/D2591>
>>                          <http://chandlerc.com/D2591>
>>
>>                   <http://llvm-reviews.__chandle__rc.com/D2591
>>                 <http://chandlerc.com/D2591>
>>                          <http://llvm-reviews.__chandlerc.com/D2591
>>                 <http://llvm-reviews.chandlerc.com/D2591>>>).
>>
>>
>>                                           Our patch also changes the way
>>                 that functions
>>                 are
>>                                           compared quite
>>                                           significantly. This is
>>                 necessary because we
>>                 need to
>>                                           compare all
>>                                           functions in a bucket in order
>>                 to get a
>>                 similarity
>>                                           measure for each
>>                                           pair, so we can then decide
>>                 which 'groups' of
>>                          functions
>>                                           to merge.
>>
>>                                           I can't really see a way to
>>                 combine our
>>                          approach with
>>                                           your patch. What
>>                                           are your thoughts?
>>
>>                                           Another way to improve the
>>                 performance of
>>                          MergeFunctions
>>                                           might be to
>>                                           make the hash function better.
>>                 I've already
>>                          added the
>>                                           size of the first
>>                                           BB to it, and perhaps there
>>                 are other factors
>>                          we could
>>                                           try... if we
>>                                           don't have to compare
>>                 functions in the first
>>                 place
>>                                           (because they're in
>>                                           different buckets) then that's
>>                 obviously a much
>>                          bigger win.
>>
>>                                           Thanks,
>>                                           Tobias
>>
>>                                           On 17/01/2014 20:25, Stepan
>>                 Dyatkovskiy wrote:
>>
>>                                                  Hi all,
>>
>>                                                  I propose simple
>>                 improvement for
>>                          MergeFunctions
>>                                               pass, that reduced
>>                                               its
>>                                                  complexity from O(N^2)
>>                 to O(log(N)),
>>                          where N is
>>                                               number of
>>                                               functions in
>>                                                  module.
>>
>>                                                  The idea, is to replace
>>                 the result of
>>                          comparison
>>                                               from "bool" to
>>                                                  "-1,0,1". In another
>>                 words: define order
>>                          relation
>>                                               on functions set.
>>                                                  To be sure, that
>>                 functions could be
>>                          comparable
>>                                               that way, we have to
>>                                                  prove that order
>>                 relation is possible
>>                 on all
>>                                               comparison stage.
>>
>>                                                  The last one is
>>                 possible, if for each
>>                          comparison
>>                                               stage we
>>                                               implement has
>>                                                  next properties:
>>                                                  * reflexivity (a <= a,
>>                 a == a, a >= a),
>>                                                  * antisymmetry (if a <=
>>                 b and b <= a
>>                          then a  == b),
>>                                                  * transitivity (a <= b
>>                 and b <= c, then
>>                          a <= c)
>>                                                  * asymmetry (if a < b,
>>                 then a > b or a
>>                          == b).
>>
>>                                                  Once we have defined
>>                 order relation we
>>                          can store
>>                                               all the functions in
>>                                                  binary tree and perform
>>                 lookup in
>>                          O(log(N)) time.
>>
>>                                                  This post has two
>>                 attachments:
>>                                                  1. The patch, that has
>>                 implementation of
>>                          this idea.
>>                                                  2. The MergeFunctions
>>                 pass detailed
>>                          description,
>>                                               with explanation how
>>                                                  order relation could be
>>                 possible.
>>
>>                                                  Hope it helps to make
>>                 things better!
>>
>>                                                  -Stepan.
>>
>>
>>
>>                 _____________________________________________________
>>                                                  LLVM Developers mailing
>>                 list
>>                 LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>                 <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>                          <mailto:LLVMdev at cs.uiuc.edu
>>                 <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu
>>                 <mailto:LLVMdev at cs.uiuc.edu>>>
>>                 http://llvm.cs.uiuc.edu
>>                 http://lists.cs.uiuc.edu/______mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev>
>>
>>                 <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>>
>>
>>
>>                 <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>
>>                 <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>>
>>
>>
>>
>>
>>                   _____________________________________________________
>>                               LLVM Developers mailing list
>>                 LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>                 <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>                          <mailto:LLVMdev at cs.uiuc.edu
>>                 <mailto:LLVMdev at cs.uiuc.edu> <mailto:LLVMdev at cs.uiuc.edu
>>                 <mailto:LLVMdev at cs.uiuc.edu>>>
>>                 http://llvm.cs.uiuc.edu
>>                 http://lists.cs.uiuc.edu/______mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev>
>>
>>                 <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>>
>>
>>                   <http://lists.cs.uiuc.edu/____mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev>
>>
>>                 <http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>>>
>>
>>
>>
>>
>>                      _________________________________________________
>>                      LLVM Developers mailing list
>>                 LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>                 <mailto:LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>>
>>                 http://llvm.cs.uiuc.edu
>>                 http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>                 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>>
>>
>>
>>             _________________________________________________
>>             LLVM Developers mailing list
>>             LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>             http://llvm.cs.uiuc.edu
>>             http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>             <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>>
>>         _________________________________________________
>>         LLVM Developers mailing list
>>         LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>         http://llvm.cs.uiuc.edu
>>         http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
>>         <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>>
>>
>>
>>
>>
>