[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Stepan Dyatkovskiy
stpworld at narod.ru
Wed Jan 22 07:35:10 PST 2014
Hi Raul and Duncan,
Duncan,
Thank you for review. I hope to present fixed patch tomorrow.
First, I would like to show few performance results:
command: "time opt -mergefunc <test>"
File: tramp3d-v4.ll, 12963 functions
Current : Patched
Time: 9.8 s : 2.0 secs ; >400%!!
Functions merged: 3503 : 3507
File: spirit.ll, 2885 functions
Current : Patched
Time: 2.2 s : 0.5 s
Functions merged: 1503 : 1505
File: k.ll, 2788 functions
Current : Patched
Time: 1.5 s : 0.7 s
Functions merged: 1626 : 1626
File: sqlite3.ll, 1057 functions
Current : Patched
Time: 0.4 s : 0.4 s
Functions merged: 7 : 7
All the tests were token from test-suite object directory. In average it
shows >200% performance improvement. You could also see that patched
version detects a bit more functions for merging, since it has improved
constant-equivalence detection.
Raul,
Yes. you right it is O(N*log(N)). And yes, I have some ideas about hashing.
Though in the end of hashing improvement I still see the tree. I'll explain.
Once you calculated hash (even really good one), you still have
probability of N>1 functions with the same hash. Of course it could be
really low. But then, you have to spend some time just being creating
such a complex hash (definitely, you have to scan whole function body).
I think there should be sequence of short hash numbers. In another words
we can try to represent function as a chain of hash codes:
F0: hashF0-0, hashF0-1, hashF0-2, ...
F1: hashF1-0, hashF1-1, ...
In this case you can create kind of hash-tree. Consider we have F0, F1
and F2.
Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1:
Then we can build the next tree (requires fixed-width fonts):
[hashF0-0]
/ \
[hashF0-1] [hashF2-1]
/ \ \
[hashF0-2] [hashF1-2] [hashF2-2]
In this case as you can see, whole process would be of complexity
O(N*n), where:
N - number of functions
n - function size.
Note, hashF[i]-[j] could be generated on demand. It is not obvious to
generate whole hash at once.
I have also implemented this approach. Though it is looks more complex,
and not so fast, as I hoped. Perhaps I can make it simplier. But it
could not be simpler that just to replace "bool" with -1,0,1. Actually I
really wanted to propose idea with hash-trees, since it combines
advantages of hashing and tree-based lookup routines. But then I
discovered idea I presented finally, and saw that it shows almost same
improvement and looks much simpler.
-Stepan
Duncan P. N. Exon Smith wrote:
> Hi Stepan,
>
> This looks interesting! Some high-level comments:
>
> - Please post the patch untarred next time. Also, I'm not sure if it's typical
> to provide supporting documents in .doc format; while many of us probably
> have access to Word, a more portable and email-friendly format (like text,
> markdown or rst) is probably better.
> - Have you profiled this? What's the speedup? I second Raul's question about
> whether improving the hashing (or doing secondary hashing) could result in a
> larger benefit (maybe even with simpler code).
> - At the beginning of FunctionComparator::cmpEnumerate, you have a comment that
> you can't introduce an order relation for cross-references, and the following
> code:
>
> // Unfortunately we can't introduce order relation for
> // cross reference case. It is possible for self-reference only.
> if (V1 == F1) {
> if (V2 == F2)
> return 0;
> return -1;
> }
> if (V2 == F2) {
> if (V1 == F1)
> return 0;
> return 1;
> }
>
> Can't this break asymmetry? Doesn't breaking the ordering relation cause a
> correctness problem? I.e., what am I missing here?
>
> I also have a few low-level comments (and nitpicks):
>
> - The comments at the top of the file don't match the new implementation!
> - Lots of strange tabbing after your changes. E.g.:
>
> + int cmpGEP(const GetElementPtrInst *GEP1,
> const GetElementPtrInst *GEP2) {
>
> - After the FunctionPtr class declaration, you have an extra blank line. There
> are a few of these scattered through (e.g., at the beginning of
> FunctionComparator::cmpConstants).
> - Your new helper functions (such as cmpNumbers) that are local to the file
> should be declared 'static'.
> - You use the following pattern throughout:
>
> int Res;
> // ...
> Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands());
> if (Res != 0)
> return Res;
>
> whereas, since you never use Res unless it's non-zero, I think the
> following is more clear and concise:
>
> if (int Res = cmpNumbers(I1->getNumOperands(), I2->getNumOperands()))
> return Res;
>
> - I'm not sure if your MERGEFUNC_SANITY should be included the way it is:
>
> +//#define MERGEFUNC_SANITY 900
> +#ifdef MERGEFUNC_SANITY
>
> - I’m also unsure about the following:
>
> + // If we have some problematic functions, perhaps we would like
> + // to keep declarations only, and those definitions fires the problem.
> + // We could insert manualy last ones (notepad, nano, vi, whatever else :-) )
> +#if 0
> + for (Module::iterator I = M.begin(), E = M.end(); I != E; ++I) {
> + Function *F = I;
> + F->deleteBody();
> + }
> +#endif
>
> Cheers,
> Duncan
>
> On Jan 21, 2014, at 9:58 AM, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:
>
>> ping
>> Stepan Dyatkovskiy wrote:
>>> Hi all,
>>>
>>> I propose simple improvement for MergeFunctions pass, that reduced its
>>> complexity from O(N^2) to O(log(N)), where N is number of functions in
>>> module.
>>>
>>> The idea, is to replace the result of comparison from "bool" to
>>> "-1,0,1". In another words: define order relation on functions set.
>>> To be sure, that functions could be comparable that way, we have to
>>> prove that order relation is possible on all comparison stage.
>>>
>>> The last one is possible, if for each comparison stage we implement has
>>> next properties:
>>> * reflexivity (a <= a, a == a, a >= a),
>>> * antisymmetry (if a <= b and b <= a then a == b),
>>> * transitivity (a <= b and b <= c, then a <= c)
>>> * asymmetry (if a < b, then a > b or a == b).
>
> For asymmetry, I think it’s: if a < b, then not a > b and not a == b.
>
>>>
>>> Once we have defined order relation we can store all the functions in
>>> binary tree and perform lookup in O(log(N)) time.
>>>
>>> This post has two attachments:
>>> 1. The patch, that has implementation of this idea.
>>> 2. The MergeFunctions pass detailed description, with explanation how
>>> order relation could be possible.
>>>
>>> Hope it helps to make things better!
>>>
>>> -Stepan.
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list