LLVM Documentation: MergeFunctions pass
Stepan Dyatkovskiy
sdyatkovskiy at accesssoftek.com
Mon Dec 8 11:53:26 PST 2014
OK. Thanks!
-Stepan
On 08 Dec 2014, at 10:14, Nick Lewycky <nicholas at mxc.ca> wrote:
> llvm at dyatkovskiy.com wrote:
>> Hi Sean,
>> May be I could commit this documentation, in post-commit review mode?
>
> It seems that Sean has already reviewed it. If Sean is fine with it,
> then it can go in.
>
> Nick
>
>> Thanks!
>> -Stepan
>> 03.12.2014, 23:27, "llvm at dyatkovskiy.com" <llvm at dyatkovskiy.com>:
>>> ping
>>> 21.11.2014, 22:44, "llvm at dyatkovskiy.com
>>> <mailto:llvm at dyatkovskiy.com>" <llvm at dyatkovskiy.com
>>> <mailto:llvm at dyatkovskiy.com>>:
>>>> Hi Nick,
>>>> Could you please look at pass documentation..
>>>> Thanks!
>>>> -Stepan
>>>> 01.11.2014, 03:44, "Sean Silva" <chisophugis at gmail.com
>>>> <mailto:chisophugis at gmail.com>>:
>>>>> I'm okay with it. Nick?
>>>>>
>>>>> On Fri, Oct 31, 2014 at 1:14 PM, <llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>> wrote:
>>>>>
>>>>> ping
>>>>>
>>>>> 20.10.2014, 13:36, "llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>" <llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>>:
>>>>>> Ping.
>>>>>> -Stepan
>>>>>>
>>>>>> 07.10.2014, 13:30, "Stepan Dyatkovskiy"
>>>>> <sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com>>:
>>>>>>> ping
>>>>>>> On 03 Oct 2014, at 11:42, Stepan Dyatkovskiy
>>>>> <sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com>> wrote:
>>>>>>>> Hi Sean,
>>>>>>>> Both issues you mentioned has been fixed. Final patch has
>>>>> been reattached.
>>>>>>>>
>>>>>>>> Thanks for reviews!
>>>>>>>> -Stepan.
>>>>>>>>
>>>>>>>> On 03 Oct 2014, at 03:27, Sean Silva
>>>>> <chisophugis at gmail.com <mailto:chisophugis at gmail.com>> wrote:
>>>>>>>>
>>>>>>>> On Thu, Oct 2, 2014 at 12:40 AM, Stepan Dyatkovskiy
>>>>> <sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com><mailto:sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com>>> wrote:
>>>>>>>> Hi Sean,
>>>>>>>>>> Sometimes code contains functions that does exactly the
>>>>> same thing even though
>>>>>>>>>> they are non-equal on the binary level.
>>>>>>>>> This confuses me; do you mean non-equal on the source
>>>>> level, but equal on the binary level?
>>>>>>>> I mean equal on output. As if you treat function as a
>>>>> black-box with only inputs and outputs present. Functions could
>>>>> be different on binary level but equal on output, e.g:
>>>>>>>>
>>>>>>>> int foo_0(int a) {
>>>>>>>> return a + a;
>>>>>>>> }
>>>>>>>>
>>>>>>>> int foo_1(int a) {
>>>>>>>> return a * 2;
>>>>>>>> }
>>>>>>>>
>>>>>>>> int foo_2(int a) {
>>>>>>>> return a << 1;
>>>>>>>> }
>>>>>>>>
>>>>>>>> It also happens that such functions are different on one
>>>>> stage, and become equal after optimisation pass.
>>>>>>>>
>>>>>>>> I have rephrased text you mentioned as follows:
>>>>>>>>
>>>>>>>> [quote]
>>>>>>>> Sometimes code contains equal functions, or functions that
>>>>> does exactly the same
>>>>>>>> thing even though they are non-equal on the IR level
>>>>> (e.g.: multiplication on 2
>>>>>>>> and 'shl 2’).
>>>>>>>> [/quote]
>>>>>>>>
>>>>>>>> Should be `shl 1`, but otherwise this fixes the issue I
>>>>> mentioned.
>>>>>>>>>> If we will track every numbers and flags to be compared
>>>>> we would be able to get
>>>>>>>>>> numbers chain and then create the hash number. So, once
>>>>> again, *total-ordering*
>>>>>>>>>> could be considered as a milestone for even faster (in
>>>>> theory) random-access
>>>>>>>>>> approach.
>>>>>>>>> I'm not sure this makes sense. I imagine that part of the
>>>>> benefit of the comparison-based approach is that the comparisons
>>>>> can return early once they find a difference. Hashing > always
>>>>> has to look at everything. Does the current comparison routine
>>>>> look at the entire function before actually doing any comparisons?
>>>>>>>> Nope, it behaves exactly as you imagined: comparison
>>>>> returns result once it find a difference.
>>>>>>>>
>>>>>>>> As I mentioned in article I tried random-access approach,
>>>>> it works a bit slower. But it has complexity O(N), so one day
>>>>> somebody could decide that he knows how to create fast
>>>>> random-access implementation. I think its just important to
>>>>> explain briefly why logarithmical search is used now, and what
>>>>> are the possible ways to improve current implementation. Taking
>>>>> into account your question I have rephrased this text:
>>>>>>>>
>>>>>>>> [quote]
>>>>>>>> We can use the same comparison algorithm. During
>>>>> comparison we exit once we find
>>>>>>>> the difference, but here we have to scan whole function
>>>>> body every time (note,
>>>>>>>> it could be slower). Like in "total-ordering", we will
>>>>> track every numbers and
>>>>>>>> flags, but instead of comparison, we should get numbers
>>>>> sequence and then
>>>>>>>> create the hash number. So, once again, *total-ordering*
>>>>> could be considered as
>>>>>>>> a milestone for even faster (in theory) random-access
>>>>> approach.
>>>>>>>> [/quote]
>>>>>>>>
>>>>>>>> This sounds good, but please say "but here we might have
>>>>> to scan whole function body every time"; otherwise it sounds
>>>>> contradictory.
>>>>>>>>
>>>>>>>> I have also updated Passes.rst (paragraph about
>>>>> MergeFunctions):
>>>>>>>>
>>>>>>>> [quote]
>>>>>>>>
>>>>>>>> This pass looks for equivalent functions that are mergable
>>>>> and folds them.
>>>>>>>> Total-ordering is introduced among the functions set: we
>>>>> define comparison that answers for every two functions which of
>>>>> them is greater. It allows to arrange functions into the binary
>>>>> tree.
>>>>>>>> For every new function we check for equivalent in tree.
>>>>>>>> If equivalent exists we fold such functions. If both
>>>>> functions are overridable, we move the functionality into a new
>>>>> internal function and leave two overridable thunks to it.
>>>>>>>> If there is no equivalent, then we add this function to tree.
>>>>>>>> Lookup routine has O(log(n)) complexity, while whole
>>>>> merging process has complexity of O(n*log(n)).
>>>>>>>> Read this(link) article for more details.
>>>>>>>>
>>>>>>>> [/quote]
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Stepan
>>>>>>>>
>>>>>>>> On 30 Sep 2014, at 02:03, Sean Silva
>>>>> <chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>>> wrote:
>>>>>>>>
>>>>>>>> Thanks for answering those questions; that really helps.
>>>>> Could you please address the "random comments" that I mentioned
>>>>> in my original reply?
>>>>>>>>
>>>>>>>> As it stands, I'm currently in favor of committing this
>>>>> (with the "random comments" fixed); Nick, what do you think?
>>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>> On Mon, Sep 29, 2014 at 2:26 AM, Stepan Dyatkovskiy
>>>>> <sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com><mailto:sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com>><mailto:sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com><mailto:sdyatkovskiy at accesssoftek.com
>>>>> <mailto:sdyatkovskiy at accesssoftek.com>>>> wrote:
>>>>>>>> Hi Nick and Silva.
>>>>>>>> Sorry again for such a latency.
>>>>>>>>
>>>>>>>> In new version I have answered on three questions mentioned in
>>>>>>>> http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines
>>>>>>>>
>>>>>>>> Mostly it answers on Nick’s questions as well. I would
>>>>> like to stop specially on next question:
>>>>>>>>> What is the burden for updating this document as the
>>>>> implementation changes and why is that a good tradeoff?
>>>>>>>> I tried to describe common cases. I quoted a little of
>>>>> comments and described functions implementation, but I tried to
>>>>> cut off places where we potentially could change logic,
>>>>> proposing reader to view the sources for more details. Anyways,
>>>>> if it happen to be, I’ll try to cut such extra details from
>>>>> documentation and replace it with more generic form.
>>>>>>>>
>>>>>>>> This article is extension to source code and to comments
>>>>> we’ve added there. And it's been written on higher level than
>>>>> comments in source code.
>>>>>>>> (Frankly, I started it as a prove of total-ordering
>>>>> approach we used in MergeFunctions, but then just extended it
>>>>> and got full-featured article :-) )
>>>>>>>>
>>>>>>>> Below are the answers quoted from article:
>>>>>>>>
>>>>>>>> [quote]
>>>>>>>>
>>>>>>>> 1. Why would I want to read this document?
>>>>>>>> Document is the extension to pass comments and describes
>>>>> the pass logic. It describes algorithm that is used in order to
>>>>> compare functions, and contains the explanations of how we could
>>>>> then combine equal functions correctly, keeping module valid.
>>>>>>>> Material brought in top-down form, so reader could start
>>>>> learn pass from ideas and end up with low-level algorithm
>>>>> details, thus preparing him for reading the sources.
>>>>>>>> So main goal is do describe algorithm and logic here; the
>>>>> concept. This document is good for you, if you don’t want to
>>>>> read the source code, but want to understand pass algorithms.
>>>>> Author tried not to repeat the source-code and cover only common
>>>>> cases, and thus avoid cases when after minor code changes we
>>>>> need to update this document.
>>>>>>>>
>>>>>>>> 2. What should I know to be able to follow along with this
>>>>> document?
>>>>>>>> Reader should be familiar with common compile-engineering
>>>>> principles and LLVM code fundamentals. In this article we
>>>>> suppose reader is familiar with Single Static Assingment
>>>>> concepts. Understanding of IR structure is also important.
>>>>>>>> We will use such terms as “module”, “function”, “basic
>>>>> block”, “user”, “value”, “instruction”.
>>>>>>>> As a good start point, Kaleidoscope tutorial could be used
>>>>> (link).
>>>>>>>> Especially it’s important to understand chapter 3 of
>>>>> tutorial (link).
>>>>>>>> Reader also should know how passes work in LLVM, he could
>>>>> use next article as a reference and start point here (link).
>>>>>>>> What else? Well perhaps reader also should have some
>>>>> experience in LLVM pass debugging and bug-fixing.
>>>>>>>>
>>>>>>>> 3. What I gain by reading this document?
>>>>>>>> Main purpose is to provide reader with comfortable form of
>>>>> algorithms description, namely the human reading text. Since it
>>>>> could be hard to understand algorithm straight from the source
>>>>> code: pass uses some principles that have to be explained first.
>>>>>>>> Author wishes to everybody to avoid case, when you read
>>>>> code from top to bottom again and again, and yet you don’t
>>>>> understand why we implemented it that way.
>>>>>>>> We hope that after this article reader could easily debug
>>>>> and improve MergeFunctions pass and thus help LLVM project.
>>>>>>>>
>>>>>>>> [/quote]
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> -Stepan
>>>>>>>>
>>>>>>>> On 16 Sep 2014, at 05:16, Sean Silva
>>>>> <chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Sep 15, 2014 at 3:07 PM, Nick Lewycky
>>>>> <nlewycky at google.com
>>>>> <mailto:nlewycky at google.com><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com>><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com>>><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com>><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com><mailto:nlewycky at google.com
>>>>> <mailto:nlewycky at google.com>>>>> wrote:
>>>>>>>> On 15 September 2014 15:02, Sean Silva
>>>>> <chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>>><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com
>>>>> <mailto:chisophugis at gmail.com>>>>> wrote:
>>>>>>>> Wow, this is a really detailed document. Great work!
>>>>>>>>
>>>>>>>> I wouldn't typically recommend a document to go into this
>>>>> much detail, but I think that in this particular case, it is
>>>>> fine to have this detail since the document can double as a
>>>>> "in-depth walkthrough of a specific LLVM pass", which I'm sure
>>>>> will be useful for newbies to get a feel for things.
>>>>>>>>
>>>>>>>> Actually, I have questions on this point before I get into
>>>>> reviewing the contents. This is the first piece of pass
>>>>> documentation. Who is the intended audience? What is the desired
>>>>> level of detail and why?
>>>>>>>>
>>>>>>>> Hopefully this should get answered once Stepan an updates
>>>>> to answer the three questions:
>>>>> http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines
>>>>>>>>
>>>>>>>> At what point should implementation details be found by
>>>>> reading the code instead of being in the documentation? Or is
>>>>> this supposed to be a higher-level understanding of the
>>>>> algorithm like an academic paper but without the tone (or
>>>>> impenetrable writing)? What is the burden for updating this
>>>>> document as the implementation changes and why is that a good
>>>>> tradeoff?
>>>>>>>>
>>>>>>>> I really don't have a good answer to this. I sort of lean
>>>>> towards the "informal paper" interpretation. My gut right now is
>>>>> that this would be worth having as a hold-your-hand walkthrough
>>>>> for newbies, and would continue to be so even if details of the
>>>>> code changed underneath it. But I really don't have a good way
>>>>> to weight that against the downsides, like the ongoing
>>>>> maintenance commitment, if any. Any ideas are welcome.
>>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>>>>>>>> In your first section please answer the three questions
>>>>> here: http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines
>>>>>>>>
>>>>>>>> I don't know that much about the pass (especially the new
>>>>> implementation), so Nick, could you skim over the content to
>>>>> make sure it is covering all the main bases?
>>>>>>>>
>>>>>>>> Some random comments:
>>>>>>>>> Sometimes code contains functions that does exactly the
>>>>> same thing even though
>>>>>>>>> they are non-equal on the binary level.
>>>>>>>> This confuses me; do you mean non-equal on the source
>>>>> level, but equal on the binary level?
>>>>>>>>> If we will track every numbers and flags to be compared
>>>>> we would be able to get
>>>>>>>>> numbers chain and then create the hash number. So, once
>>>>> again, *total-ordering*
>>>>>>>>> could be considered as a milestone for even faster (in
>>>>> theory) random-access
>>>>>>>>> approach.
>>>>>>>> I'm not sure this makes sense. I imagine that part of the
>>>>> benefit of the comparison-based approach is that the comparisons
>>>>> can return early once they find a difference. Hashing always has
>>>>> to look at everything. Does the current comparison routine look
>>>>> at the entire function before actually doing any comparisons?
>>>>>>>>> #. For two trees *T1* and *T2* we perform
>>>>> *depth-first-trace* and have two
>>>>>>>>> chains as a product: "*T1Items*" and "*T2Items*".
>>>>>>>> I think most readers would be more comfortable with the
>>>>> terms "depth-first-traversal" instead of "depth-first-trace" and
>>>>> "sequences" instead of "chains".
>>>>>>>>> Consider modification of *cmpType* method.
>>>>>>>> What does this paragraph mean?
>>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>> On Sun, Sep 14, 2014 at 11:02 PM, <llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>>><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com><mailto:llvm at dyatkovskiy.com
>>>>> <mailto:llvm at dyatkovskiy.com>>>>> wrote:
>>>>>>>> ping
>>>>>>>>
>>>>>>>> 11.09.2014, 12:50, "Stepan Dyatkovskiy" <stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru>><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru>>><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru>><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru><mailto:stpworld at narod.ru
>>>>> <mailto:stpworld at narod.ru>>>>>:
>>>>>>>>> Reattached as patch.
>>>>>>>>>
>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>> Hello everyone,
>>>>>>>>>> Please review the MergeFunctions pass documentation in
>>>>> attachment. Hope
>>>>>>>>>> doc is clear enough :-)
>>>>>>>>>>
>>>>>>>>>> - Stepan
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu><mailto:llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>><mailto:llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu><mailto:llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>>><mailto:llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu><mailto:llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>><mailto:llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu><mailto:llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>>>>
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>
>>>>>>>> <2014-10-03-mergefunc-doc.patch>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> ,
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>> ,
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>
More information about the llvm-commits
mailing list