[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Stepan Dyatkovskiy stpworld at narod.ru
Wed Jan 22 07:35:10 PST 2014


Hi Raul and Duncan,

Duncan,
Thank you for review. I hope to present fixed patch tomorrow.

First, I would like to show few performance results:

command: "time opt -mergefunc <test>"

File: tramp3d-v4.ll, 12963 functions
                   Current : Patched
Time:              9.8 s  : 2.0 secs  ; >400%!!
Functions merged:  3503   : 3507

File: spirit.ll, 2885 functions
                   Current : Patched
Time:              2.2 s  : 0.5 s
Functions merged:  1503   : 1505

File: k.ll, 2788 functions
                   Current : Patched
Time:              1.5 s  : 0.7 s
Functions merged:  1626   : 1626

File: sqlite3.ll, 1057 functions
                   Current : Patched
Time:              0.4 s  : 0.4 s
Functions merged:  7      : 7

All the tests were token from test-suite object directory. In average it 
shows >200% performance improvement. You could also see that patched 
version detects a bit more functions for merging, since it has improved 
constant-equivalence detection.

Raul,

Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing.
Though in the end of hashing improvement I still see the tree. I'll explain.

Once you calculated hash (even really good one), you still have 
probability of N>1 functions with the same hash. Of course it could be 
really low. But then, you have to spend some time just being creating 
such a complex hash (definitely, you have to scan whole function body).

I think there should be sequence of short hash numbers. In another words 
we can try to represent function as a chain of hash codes:
F0: hashF0-0, hashF0-1, hashF0-2, ...
F1: hashF1-0, hashF1-1, ...

In this case you can create kind of hash-tree. Consider we have F0, F1 
and F2.

Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1:

Then we can build the next tree (requires fixed-width fonts):

              [hashF0-0]
               /      \
      [hashF0-1]      [hashF2-1]
       /      \             \
[hashF0-2]  [hashF1-2]   [hashF2-2]

In this case as you can see, whole process would be of complexity 
O(N*n), where:

N - number of functions
n - function size.

Note, hashF[i]-[j] could be generated on demand. It is not obvious to 
generate whole hash at once.

I have also implemented this approach. Though it is looks more complex, 
and not so fast, as I hoped. Perhaps I can make it simplier. But it 
could not be simpler that just to replace "bool" with -1,0,1. Actually I 
really wanted to propose idea with hash-trees, since it combines 
advantages of hashing and tree-based lookup routines. But then I 
discovered idea I presented finally, and saw that it shows almost same 
improvement and looks much simpler.

-Stepan

Duncan P. N. Exon Smith wrote:
> Hi Stepan,
>
> This looks interesting!  Some high-level comments:
>
> - Please post the patch untarred next time.  Also, I'm not sure if it's typical
>    to provide supporting documents in .doc format; while many of us probably
>    have access to Word, a more portable and email-friendly format (like text,
>    markdown or rst) is probably better.
> - Have you profiled this?  What's the speedup?  I second Raul's question about
>    whether improving the hashing (or doing secondary hashing) could result in a
>    larger benefit (maybe even with simpler code).
> - At the beginning of FunctionComparator::cmpEnumerate, you have a comment that
>    you can't introduce an order relation for cross-references, and the following
>    code:
>
>          // Unfortunately we can't introduce order relation for
>          // cross reference case. It is possible for self-reference only.
>          if (V1 == F1) {
>            if (V2 == F2)
>              return 0;
>            return -1;
>          }
>          if (V2 == F2) {
>            if (V1 == F1)
>              return 0;
>            return 1;
>          }
>
>      Can't this break asymmetry?  Doesn't breaking the ordering relation cause a
>      correctness problem?  I.e., what am I missing here?
>
> I also have a few low-level comments (and nitpicks):
>
> - The comments at the top of the file don't match the new implementation!
> - Lots of strange tabbing after your changes.  E.g.:
>
>          +  int cmpGEP(const GetElementPtrInst *GEP1,
>                                  const GetElementPtrInst *GEP2) {
>
> - After the FunctionPtr class declaration, you have an extra blank line.  There
>    are a few of these scattered through (e.g., at the beginning of
>    FunctionComparator::cmpConstants).
> - Your new helper functions (such as cmpNumbers) that are local to the file
>    should be declared 'static'.
> - You use the following pattern throughout:
>
>          int Res;
>          // ...
>          Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands());
>          if (Res != 0)
>            return Res;
>
>      whereas, since you never use Res unless it's non-zero, I think the
>      following is more clear and concise:
>
>          if (int Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands()))
>            return Res;
>
> - I'm not sure if your MERGEFUNC_SANITY should be included the way it is:
>
>          +//#define MERGEFUNC_SANITY 900
>          +#ifdef MERGEFUNC_SANITY
>
> - I’m also unsure about the following:
>
>          +  // If we have some problematic functions, perhaps we would like
>          +  // to keep declarations only, and those definitions fires the problem.
>          +  // We could insert manualy last ones (notepad, nano, vi, whatever else :-) )
>          +#if 0
>          +  for (Module::iterator I = M.begin(), E = M.end(); I != E; ++I) {
>          +    Function *F = I;
>          +      F->deleteBody();
>          +  }
>          +#endif
>
> Cheers,
> Duncan
>
> On Jan 21, 2014, at 9:58 AM, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:
>
>> ping
>> Stepan Dyatkovskiy wrote:
>>> Hi all,
>>>
>>> I propose simple improvement for MergeFunctions pass, that reduced its
>>> complexity from O(N^2) to O(log(N)), where N is number of functions in
>>> module.
>>>
>>> The idea, is to replace the result of comparison from "bool" to
>>> "-1,0,1". In another words: define order relation on functions set.
>>> To be sure, that functions could be comparable that way, we have to
>>> prove that order relation is possible on all comparison stage.
>>>
>>> The last one is possible, if for each comparison stage we implement has
>>> next properties:
>>> * reflexivity (a <= a, a == a, a >= a),
>>> * antisymmetry (if a <= b and b <= a then a  == b),
>>> * transitivity (a <= b and b <= c, then a <= c)
>>> * asymmetry (if a < b, then a > b or a == b).
>
> For asymmetry, I think it’s: if a < b, then not a > b and not a == b.
>
>>>
>>> Once we have defined order relation we can store all the functions in
>>> binary tree and perform lookup in O(log(N)) time.
>>>
>>> This post has two attachments:
>>> 1. The patch, that has implementation of this idea.
>>> 2. The MergeFunctions pass detailed description, with explanation how
>>> order relation could be possible.
>>>
>>> Hope it helps to make things better!
>>>
>>> -Stepan.
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>




More information about the llvm-dev mailing list