[LLVMdev] MergeFunctions: reduce complexity to O(log(N))

Mon Jan 27 10:32:48 PST 2014

Hello,
I'm glad to present you patch with fixes and clean-ups:

1. Comments at the top of the file were fixed. Also I added some new 
TODO's there. Fixed "Strange" tabbing.

2.
 >    whereas, since you never use Res unless it's non-zero, I think the
 >    following is more clear and concise:
 >
 >        if (int Res = cmpNumbers(I1->getNumOperands(), 
I2->getNumOperands()))
 >          return Res;

3. MERGEFUNC_SANITY was converted into additional command-line option. 
So you can run it with command:
opt -mergefunc -mergefunc-sanity 100 my-module.ll
And it will check the pass for sanity using first 100 functions in your 
module.

4. Constants. I implemented cmpConstants in conservative way. So now it 
catches only subset of what old approach did, and yet it is about 99.9% 
of all cases.

Please find the patch in attachment for review.

-Stepan

Stepan Dyatkovskiy wrote:
> Hi Duncan,
>
> Statistics. Is it OK I sent it in CSV format?
>
> Below few improvement numbers are presented. Next formulae was used:
> Improvement = (OldTime/NewTime - 1)*100%:
>
> 1. Overall (even with cases when there is nothing to merge): 21%
>
> 2. When >100 functions were merged: 95%
> (this is what I meant in previous post, when there is job for
> -mergefunc, new patch does it 2 times faster)
>
> 3. >500 functions (though there are only 3 modules): 127%.
>
> 4. Total number of functions: ~60000
> (currently this number is absent in statistics, it just based on
> previous analysis)
>
> 5. Functions were merged with patch: 8342
> 6. Functions were merged without patch: 8345
>
> 7. So taking into account #5 and #6, we can conclude, that
> MergeFunctions catches about 10-14% of cases (though I need to recheck
> this number).
>
> Since it is not possible to establish order relation on cross-referenced
> functions, here we can see that new patch can't merge 3 functions.
> Though it is only 0.03%.
>
> All .ll files were token from test-suite object dir.
>
> I'm working on more detailed statistics. Once statistics script is
> written, it is easy to bring more details into this research.
>
> P.S.: New patches are coming also...
>
> -Stepan.
>
> Stepan Dyatkovskiy wrote:
>> Hi Duncan,
>>> These results look promising!  When you say 200% average, do you mean
>>> 200% average for all files in test-suite, or just these four?  Are
>>> there any negative results?
>> OK, now its clear we need really full statistics. I have one, but for
>> old revision.
>> In general, this patch only updates result type, and it really don't
>> do some extra calculations (except the constants, but below I'll
>> explain, why its necessary).
>>
>> Few words about accuracy and correctness.
>> When old algorithm says "these two parts differs", new one says only a
>> bit more: which part is less. Two things outcomes from that approach:
>>
>> 1. Each time when we determine difference in old algorithm, we also
>> determine difference in new one. If less comparison implemented
>> properly, that also means that result would be 100% the same.
>> So the correctness is provided by correctness of "less" comparison:
>> all order-relation properties should be implemented and proven.
>>
>> 2. The changes are so small (only few "if" statements for each
>> comparison method), so its almost impossible to get it working slower
>> than old one. Even in worst case, when difference lies in the very end
>> of functions, we spent almost the same cpu time. Just as example:
>> -  if (Ty1->getTypeID() != Ty2->getTypeID())
>> -    return false;
>> +  if (int Res = cmpNumbers(Ty1->getTypeID(), Ty2->getTypeID()))
>> +    return Res;
>>
>> So, when there is nothing to merge, we got the same performance
>> results, like for old approach.
>> When there are some equivalent functions that could be merged, new
>> approach gives performance improvement.
>>
>> Now about constants.
>> I'll prepare separate patch. In comparison with old approach, it
>> compares constant contents. It is replacement of bitcast possibility
>> check. And yes, its the slowest place of new approach. Here we need to
>> understand which constant is actually greater, and which one is less.
>> Though it could be implemented like that:
>>
>> if (C1 != ConstantExpr::getBitCast(const_cast<Constant*>(C2),
>> C1->getType()))
>>    return cmpConstantContents(C1, C2);
>> // to to next checks
>>
>> The only thing to be reassured here: whether getBitCast is transitive.
>>
>> About hash.
>> Thats reasonable I'll try it.
>>
>> Full statistics is coming soon..
>>
>> -Stepan
>>
>> 22.01.2014, 20:09, "Duncan P. N. Exon Smith" <dexonsmith at apple.com>:
>>> On 2014 Jan 22, at 07:35, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:
>>>
>>>>   Hi Raul and Duncan,
>>>>
>>>>   Duncan,
>>>>   Thank you for review. I hope to present fixed patch tomorrow.
>>>>
>>>>   First, I would like to show few performance results:
>>>>
>>>>   command: "time opt -mergefunc <test>"
>>>>
>>>>   File: tramp3d-v4.ll, 12963 functions
>>>>                    Current : Patched
>>>>   Time:              9.8 s  : 2.0 secs  ; >400%!!
>>>>   Functions merged:  3503   : 3507
>>>>
>>>>   File: spirit.ll, 2885 functions
>>>>                    Current : Patched
>>>>   Time:              2.2 s  : 0.5 s
>>>>   Functions merged:  1503   : 1505
>>>>
>>>>   File: k.ll, 2788 functions
>>>>                    Current : Patched
>>>>   Time:              1.5 s  : 0.7 s
>>>>   Functions merged:  1626   : 1626
>>>>
>>>>   File: sqlite3.ll, 1057 functions
>>>>                    Current : Patched
>>>>   Time:              0.4 s  : 0.4 s
>>>>   Functions merged:  7      : 7
>>>>
>>>>   All the tests were token from test-suite object directory. In
>>>> average it shows >200% performance improvement.
>>>
>>> These results look promising!  When you say 200% average, do you mean
>>> 200% average for all files in test-suite, or just these four?  Are
>>> there any negative results?
>>>
>>>>   You could also see that patched version detects a bit more
>>>> functions for merging, since it has improved constant-equivalence
>>>> detection.
>>>
>>> 1. I’d prefer to see the improvement to constant-equivalence detection
>>>     as a separate initial patch (to the old code).  Is that possible?
>>>     In particular, it would be reassuring to see your patch *not* change
>>>     the number of functions merged.
>>>
>>> 2. I’m also still concerned about correctness.  Previously, when the
>>>     comparison result was unknown, returning “false” would
>>> conservatively
>>>     consider the functions distinct, so we had no correctness issues.
>>>
>>>     However, with your patch, *everything* must be accurately compared,
>>>     right?  Are there any cases where you can’t accurately order
>>>     functions?  (Is FunctionComparator::cmpEnumerate an example?)
>>>
>>>     If these are problems, maybe you can use the previous ==-style
>>>     comparison to confirm.
>>>
>>>>   Raul,
>>>>
>>>>   Yes. you right it is O(N*log(N)). And yes, I have some ideas about
>>>> hashing.
>>>>   Though in the end of hashing improvement I still see the tree.
>>>> I'll explain.
>>>>
>>>>   Once you calculated hash (even really good one), you still have
>>>> probability of N>1 functions with the same hash. Of course it could
>>>> be really low. But then, you have to spend some time just being
>>>> creating such a complex hash (definitely, you have to scan whole
>>>> function body).
>>>>
>>>>   I think there should be sequence of short hash numbers. In another
>>>> words we can try to represent function as a chain of hash codes:
>>>>   F0: hashF0-0, hashF0-1, hashF0-2, ...
>>>>   F1: hashF1-0, hashF1-1, ...
>>>>
>>>>   In this case you can create kind of hash-tree. Consider we have
>>>> F0, F1 and F2.
>>>>
>>>>   Imagine, hashF0-0 == hashF1-0 == hashF2-0, hashF0-1 == hashF1-1:
>>>>
>>>>   Then we can build the next tree (requires fixed-width fonts):
>>>>
>>>>               [hashF0-0]
>>>>                /      \
>>>>       [hashF0-1]      [hashF2-1]
>>>>        /      \             \
>>>>   [hashF0-2]  [hashF1-2]   [hashF2-2]
>>>>
>>>>   In this case as you can see, whole process would be of complexity
>>>> O(N*n), where:
>>>>
>>>>   N - number of functions
>>>>   n - function size.
>>>>
>>>>   Note, hashF[i]-[j] could be generated on demand. It is not obvious
>>>> to generate whole hash at once.
>>>>
>>>>   I have also implemented this approach. Though it is looks more
>>>> complex, and not so fast, as I hoped. Perhaps I can make it
>>>> simplier. But it could not be simpler that just to replace "bool"
>>>> with -1,0,1. Actually I really wanted to propose idea with
>>>> hash-trees, since it combines advantages of hashing and tree-based
>>>> lookup routines. But then I discovered idea I presented finally, and
>>>> saw that it shows almost same improvement and looks much simpler.
>>>>
>>>>   -Stepan
>>>
>>> I think you get diminishing returns for each level of the tree.  I’d be
>>> interested in seeing numbers on a simpler 2-stage hash.
>>>
>>> - Leave the first hash as it is now in the code.
>>> - Add a secondary hash that’s built from the quantities that the equals
>>>    comparison (or your new less-than comparison) is based on.
>>> - Fall back on O(n^2) for the third stage when there are collisions.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mergefunctions-less-comparison-2014-01-27.patch
Type: text/x-diff
Size: 50029 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140127/f5a07371/attachment.patch>