[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Stepan Dyatkovskiy
stpworld at narod.ru
Mon Jan 27 10:32:48 PST 2014
Hello,
I'm glad to present you patch with fixes and clean-ups:
1. Comments at the top of the file were fixed. Also I added some new
TODO's there. Fixed "Strange" tabbing.
2.
> whereas, since you never use Res unless it's non-zero, I think the
> following is more clear and concise:
>
> if (int Res = cmpNumbers(I1->getNumOperands(),
I2->getNumOperands()))
> return Res;
3. MERGEFUNC_SANITY was converted into additional command-line option.
So you can run it with command:
opt -mergefunc -mergefunc-sanity 100 my-module.ll
And it will check the pass for sanity using first 100 functions in your
module.
4. Constants. I implemented cmpConstants in conservative way. So now it
catches only subset of what old approach did, and yet it is about 99.9%
of all cases.
Please find the patch in attachment for review.
-Stepan
Stepan Dyatkovskiy wrote:
> Hi Duncan,
>
> Statistics. Is it OK I sent it in CSV format?
>
> Below few improvement numbers are presented. Next formulae was used:
> Improvement = (OldTime/NewTime - 1)*100%:
>
> 1. Overall (even with cases when there is nothing to merge): 21%
>
> 2. When >100 functions were merged: 95%
> (this is what I meant in previous post, when there is job for
> -mergefunc, new patch does it 2 times faster)
>
> 3. >500 functions (though there are only 3 modules): 127%.
>
> 4. Total number of functions: ~60000
> (currently this number is absent in statistics, it just based on
> previous analysis)
>
> 5. Functions were merged with patch: 8342
> 6. Functions were merged without patch: 8345
>
> 7. So taking into account #5 and #6, we can conclude, that
> MergeFunctions catches about 10-14% of cases (though I need to recheck
> this number).
>
> Since it is not possible to establish order relation on cross-referenced
> functions, here we can see that new patch can't merge 3 functions.
> Though it is only 0.03%.
>
> All .ll files were token from test-suite object dir.
>
> I'm working on more detailed statistics. Once statistics script is
> written, it is easy to bring more details into this research.
>
> P.S.: New patches are coming also...
>
> -Stepan.
>
> Stepan Dyatkovskiy wrote:
>> Hi Duncan,
>>> These results look promising! When you say 200% average, do you mean
>>> 200% average for all files in test-suite, or just these four? Are
>>> there any negative results?
>> OK, now its clear we need really full statistics. I have one, but for
>> old revision.
>> In general, this patch only updates result type, and it really don't
>> do some extra calculations (except the constants, but below I'll
>> explain, why its necessary).
>>
>> Few words about accuracy and correctness.
>> When old algorithm says "these two parts differs", new one says only a
>> bit more: which part is less. Two things outcomes from that approach:
>>
>> 1. Each time when we determine difference in old algorithm, we also
>> determine difference in new one. If less comparison implemented
>> properly, that also means that result would be 100% the same.
>> So the correctness is provided by correctness of "less" comparison:
>> all order-relation properties should be implemented and proven.
>>
>> 2. The changes are so small (only few "if" statements for each
>> comparison method), so its almost impossible to get it working slower
>> than old one. Even in worst case, when difference lies in the very end
>> of functions, we spent almost the same cpu time. Just as example:
>> - if (Ty1->getTypeID() != Ty2->getTypeID())
>> - return false;
>> + if (int Res = cmpNumbers(Ty1->getTypeID(), Ty2->getTypeID()))
>> + return Res;
>>
>> So, when there is nothing to merge, we got the same performance
>> results, like for old approach.
>> When there are some equivalent functions that could be merged, new
>> approach gives performance improvement.
>>
>> Now about constants.
>> I'll prepare separate patch. In comparison with old approach, it
>> compares constant contents. It is replacement of bitcast possibility
>> check. And yes, its the slowest place of new approach. Here we need to
>> understand which constant is actually greater, and which one is less.
>> Though it could be implemented like that:
>>
>> if (C1 != ConstantExpr::getBitCast(const_cast<Constant*>(C2),
>> C1->getType()))
>> return cmpConstantContents(C1, C2);
>> // to to next checks
>>
>> The only thing to be reassured here: whether getBitCast is transitive.
>>
>> About hash.
>> Thats reasonable I'll try it.
>>
>> Full statistics is coming soon..
>>
>> -Stepan
>>
>> 22.01.2014, 20:09, "Duncan P. N. Exon Smith" <dexonsmith at apple.com>:
>>> On 2014 Jan 22, at 07:35, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:
>>>
>>>> Hi Raul and Duncan,
>>>>
>>>> Duncan,
>>>> Thank you for review. I hope to present fixed patch tomorrow.
>>>>
>>>> First, I would like to show few performance results:
>>>>
>>>> command: "time opt -mergefunc <test>"
>>>>
>>>> File: tramp3d-v4.ll, 12963 functions
>>>> Current : Patched
>>>> Time: 9.8 s : 2.0 secs ; >400%!!
>>>> Functions merged: 3503 : 3507
>>>>
>>>> File: spirit.ll, 2885 functions
>>>> Current : Patched
>>>> Time: 2.2 s : 0.5 s
>>>> Functions merged: 1503 : 1505
>>>>
>>>> File: k.ll, 2788 functions
>>>> Current : Patched
>>>> Time: 1.5 s : 0.7 s
>>>> Functions merged: 1626 : 1626
>>>>
>>>> File: sqlite3.ll, 1057 functions
>>>> Current : Patched
>>>> Time: 0.4 s : 0.4 s
>>>> Functions merged: 7 : 7
>>>>
>>>> All the tests were token from test-suite object directory. In
>>>> average it shows >200% performance improvement.
>>>
>>> These results look promising! When you say 200% average, do you mean
>>> 200% average for all files in test-suite, or just these four? Are
>>> there any negative results?
>>>
>>>> You could also see that patched version detects a bit more
>>>> functions for merging, since it has improved constant-equivalence
>>>> detection.
>>>
>>> 1. I’d prefer to see the improvement to constant-equivalence detection
>>> as a separate initial patch (to the old code). Is that possible?
>>> In particular, it would be reassuring to see your patch *not* change
>>> the number of functions merged.
>>>
>>> 2. I’m also still concerned about correctness. Previously, when the
>>> comparison result was unknown, returning “false” would
>>> conservatively
>>> consider the functions distinct, so we had no correctness issues.
>>>
>>> However, with your patch, *everything* must be accurately compared,
>>> right? Are there any cases where you can’t accurately order
>>> functions? (Is FunctionComparator::cmpEnumerate an example?)
>>>
>>> If these are problems, maybe you can use the previous ==-style
>>> comparison to confirm.
>>>
>>>> Raul,
>>>>
>>>> Yes. you right it is O(N*log(N)). And yes, I have some ideas about
>>>> hashing.
>>>> Though in the end of hashing improvement I still see the tree.
>>>> I'll explain.
>>>>
>>>> Once you calculated hash (even really good one), you still have
>>>> probability of N>1 functions with the same hash. Of course it could
>>>> be really low. But then, you have to spend some time just being
>>>> creating such a complex hash (definitely, you have to scan whole
>>>> function body).
>>>>
>>>> I think there should be sequence of short hash numbers. In another
>>>> words we can try to represent function as a chain of hash codes:
>>>> F0: hashF0-0, hashF0-1, hashF0-2, ...
>>>> F1: hashF1-0, hashF1-1, ...
>>>>
>>>> In this case you can create kind of hash-tree. Consider we have
>>>> F0, F1 and F2.
>>>>
>>>> Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1:
>>>>
>>>> Then we can build the next tree (requires fixed-width fonts):
>>>>
>>>> [hashF0-0]
>>>> / \
>>>> [hashF0-1] [hashF2-1]
>>>> / \ \
>>>> [hashF0-2] [hashF1-2] [hashF2-2]
>>>>
>>>> In this case as you can see, whole process would be of complexity
>>>> O(N*n), where:
>>>>
>>>> N - number of functions
>>>> n - function size.
>>>>
>>>> Note, hashF[i]-[j] could be generated on demand. It is not obvious
>>>> to generate whole hash at once.
>>>>
>>>> I have also implemented this approach. Though it is looks more
>>>> complex, and not so fast, as I hoped. Perhaps I can make it
>>>> simplier. But it could not be simpler that just to replace "bool"
>>>> with -1,0,1. Actually I really wanted to propose idea with
>>>> hash-trees, since it combines advantages of hashing and tree-based
>>>> lookup routines. But then I discovered idea I presented finally, and
>>>> saw that it shows almost same improvement and looks much simpler.
>>>>
>>>> -Stepan
>>>
>>> I think you get diminishing returns for each level of the tree. I’d be
>>> interested in seeing numbers on a simpler 2-stage hash.
>>>
>>> - Leave the first hash as it is now in the code.
>>> - Add a secondary hash that’s built from the quantities that the equals
>>> comparison (or your new less-than comparison) is based on.
>>> - Fall back on O(n^2) for the third stage when there are collisions.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mergefunctions-less-comparison-2014-01-27.patch
Type: text/x-diff
Size: 50029 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140127/f5a07371/attachment.patch>
More information about the llvm-dev
mailing list