LLVM Documentation: MergeFunctions pass

Fri Oct 31 17:44:33 PDT 2014

I'm okay with it. Nick?

On Fri, Oct 31, 2014 at 1:14 PM, <llvm at dyatkovskiy.com> wrote:

> ping
>
> 20.10.2014, 13:36, "llvm at dyatkovskiy.com" <llvm at dyatkovskiy.com>:
> > Ping.
> > -Stepan
> >
> > 07.10.2014, 13:30, "Stepan Dyatkovskiy" <sdyatkovskiy at accesssoftek.com>:
> >>  ping
> >>  On 03 Oct 2014, at 11:42, Stepan Dyatkovskiy <
> sdyatkovskiy at accesssoftek.com> wrote:
> >>>   Hi Sean,
> >>>   Both issues you mentioned has been fixed. Final patch has been
> reattached.
> >>>
> >>>   Thanks for reviews!
> >>>   -Stepan.
> >>>
> >>>   On 03 Oct 2014, at 03:27, Sean Silva <chisophugis at gmail.com> wrote:
> >>>
> >>>   On Thu, Oct 2, 2014 at 12:40 AM, Stepan Dyatkovskiy <
> sdyatkovskiy at accesssoftek.com<mailto:sdyatkovskiy at accesssoftek.com>>
> wrote:
> >>>   Hi Sean,
> >>>>>   Sometimes code contains functions that does exactly the same thing
> even though
> >>>>>   they are non-equal on the binary level.
> >>>>   This confuses me; do you mean non-equal on the source level, but
> equal on the binary level?
> >>>   I mean equal on output. As if you treat function as a black-box with
> only inputs and outputs present. Functions could be different on binary
> level but equal on output, e.g:
> >>>
> >>>   int foo_0(int a) {
> >>>    return a + a;
> >>>   }
> >>>
> >>>   int foo_1(int a) {
> >>>    return a * 2;
> >>>   }
> >>>
> >>>   int foo_2(int a) {
> >>>    return a << 1;
> >>>   }
> >>>
> >>>   It also happens that such functions are different on one stage, and
> become equal after optimisation pass.
> >>>
> >>>   I have rephrased text you mentioned as follows:
> >>>
> >>>   [quote]
> >>>   Sometimes code contains equal functions, or functions that does
> exactly the same
> >>>   thing even though they are non-equal on the IR level (e.g.:
> multiplication on 2
> >>>   and 'shl 2’).
> >>>   [/quote]
> >>>
> >>>   Should be `shl 1`, but otherwise this fixes the issue I mentioned.
> >>>>>   If we will track every numbers and flags to be compared we would
> be able to get
> >>>>>   numbers chain and then create the hash number. So, once again,
> *total-ordering*
> >>>>>   could be considered as a milestone for even faster (in theory)
> random-access
> >>>>>   approach.
> >>>>   I'm not sure this makes sense. I imagine that part of the benefit
> of the comparison-based approach is that the comparisons can return early
> once they find a difference. Hashing > always has to look at everything.
> Does the current comparison routine look at the entire function before
> actually doing any comparisons?
> >>>   Nope, it behaves exactly as you imagined: comparison returns result
> once it find a difference.
> >>>
> >>>   As I mentioned in article I tried random-access approach, it works a
> bit slower. But it has complexity O(N), so one day somebody could decide
> that he knows how to create fast random-access implementation. I think its
> just important to explain briefly why logarithmical search is used now, and
> what are the possible ways to improve current implementation. Taking into
> account your question I have rephrased this text:
> >>>
> >>>   [quote]
> >>>   We can use the same comparison algorithm. During comparison we exit
> once we find
> >>>   the difference, but here we have to scan whole function body every
> time (note,
> >>>   it could be slower). Like in "total-ordering", we will track every
> numbers and
> >>>   flags, but instead of comparison, we should get numbers sequence and
> then
> >>>   create the hash number. So, once again, *total-ordering* could be
> considered as
> >>>   a milestone for even faster (in theory) random-access approach.
> >>>   [/quote]
> >>>
> >>>   This sounds good, but please say "but here we might have to scan
> whole function body every time"; otherwise it sounds contradictory.
> >>>
> >>>   I have also updated Passes.rst (paragraph about MergeFunctions):
> >>>
> >>>   [quote]
> >>>
> >>>   This pass looks for equivalent functions that are mergable and folds
> them.
> >>>   Total-ordering is introduced among the functions set: we define
> comparison that answers for every two functions which of them is greater.
> It allows to arrange functions into the binary tree.
> >>>   For every new function we check for equivalent in tree.
> >>>   If equivalent exists we fold such functions. If both functions are
> overridable, we move the functionality into a new internal function and
> leave two overridable thunks to it.
> >>>   If there is no equivalent, then we add this function to tree.
> >>>   Lookup routine has O(log(n)) complexity, while whole merging process
> has complexity of O(n*log(n)).
> >>>   Read this(link) article for more details.
> >>>
> >>>   [/quote]
> >>>
> >>>   Thanks!
> >>>   Stepan
> >>>
> >>>   On 30 Sep 2014, at 02:03, Sean Silva <chisophugis at gmail.com<mailto:
> chisophugis at gmail.com>> wrote:
> >>>
> >>>   Thanks for answering those questions; that really helps. Could you
> please address the "random comments" that I mentioned in my original reply?
> >>>
> >>>   As it stands, I'm currently in favor of committing this (with the
> "random comments" fixed); Nick, what do you think?
> >>>
> >>>   -- Sean Silva
> >>>
> >>>   On Mon, Sep 29, 2014 at 2:26 AM, Stepan Dyatkovskiy <
> sdyatkovskiy at accesssoftek.com<mailto:sdyatkovskiy at accesssoftek.com
> ><mailto:sdyatkovskiy at accesssoftek.com<mailto:
> sdyatkovskiy at accesssoftek.com>>> wrote:
> >>>   Hi Nick and Silva.
> >>>   Sorry again for such a latency.
> >>>
> >>>   In new version I have answered on three questions mentioned in
> >>>   http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines
> >>>
> >>>   Mostly it answers on Nick’s questions as well. I would like to stop
> specially on next question:
> >>>>   What is the burden for updating this document as the implementation
> changes and why is that a good tradeoff?
> >>>   I tried to describe common cases. I quoted a little of comments and
> described functions implementation, but I tried to cut off places where we
> potentially could change logic, proposing reader to view the sources for
> more details. Anyways, if it happen to be, I’ll try to cut such extra
> details from documentation and replace it with more generic form.
> >>>
> >>>   This article is extension to source code and to comments we’ve added
> there. And it's been written on higher level than comments in source code.
> >>>   (Frankly, I started it as a prove of total-ordering approach we used
> in MergeFunctions, but then just extended it and got full-featured article
> :-) )
> >>>
> >>>   Below are the answers quoted from article:
> >>>
> >>>   [quote]
> >>>
> >>>   1. Why would I want to read this document?
> >>>   Document is the extension to pass comments and describes the pass
> logic. It describes algorithm that is used in order to compare functions,
> and contains the explanations of how we could then combine equal functions
> correctly, keeping module valid.
> >>>   Material brought in top-down form, so reader could start learn pass
> from ideas and end up with low-level algorithm details, thus preparing him
> for reading the sources.
> >>>   So main goal is do describe algorithm and logic here; the concept.
> This document is good for you, if you don’t want to read the source code,
> but want to understand pass algorithms. Author tried not to repeat the
> source-code and cover only common cases, and thus avoid cases when after
> minor code changes we need to update this document.
> >>>
> >>>   2. What should I know to be able to follow along with this document?
> >>>   Reader should be familiar with common compile-engineering principles
> and LLVM code fundamentals. In this article we suppose reader is familiar
> with Single Static Assingment concepts. Understanding of IR structure is
> also important.
> >>>   We will use such terms as “module”, “function”, “basic block”,
> “user”, “value”, “instruction”.
> >>>   As a good start point, Kaleidoscope tutorial could be used (link).
> >>>   Especially it’s important to understand chapter 3 of tutorial (link).
> >>>   Reader also should know how passes work in LLVM, he could use next
> article as a reference and start point here (link).
> >>>   What else? Well perhaps reader also should have some experience in
> LLVM pass debugging and bug-fixing.
> >>>
> >>>   3. What I gain by reading this document?
> >>>   Main purpose is to provide reader with comfortable form of
> algorithms description, namely the human reading text. Since it could be
> hard to understand algorithm straight from the source code: pass uses some
> principles that have to be explained first.
> >>>   Author wishes to everybody to avoid case, when you read code from
> top to bottom again and again, and yet you don’t understand why we
> implemented it that way.
> >>>   We hope that after this article reader could easily debug and
> improve MergeFunctions pass and thus help LLVM project.
> >>>
> >>>   [/quote]
> >>>
> >>>   Thanks!
> >>>   -Stepan
> >>>
> >>>   On 16 Sep 2014, at 05:16, Sean Silva <chisophugis at gmail.com<mailto:
> chisophugis at gmail.com><mailto:chisophugis at gmail.com<mailto:
> chisophugis at gmail.com>>> wrote:
> >>>
> >>>   On Mon, Sep 15, 2014 at 3:07 PM, Nick Lewycky <nlewycky at google.com
> <mailto:nlewycky at google.com><mailto:nlewycky at google.com<mailto:
> nlewycky at google.com>><mailto:nlewycky at google.com<mailto:
> nlewycky at google.com><mailto:nlewycky at google.com<mailto:nlewycky at google.com>>>>
> wrote:
> >>>   On 15 September 2014 15:02, Sean Silva <chisophugis at gmail.com
> <mailto:chisophugis at gmail.com><mailto:chisophugis at gmail.com<mailto:
> chisophugis at gmail.com>><mailto:chisophugis at gmail.com<mailto:
> chisophugis at gmail.com><mailto:chisophugis at gmail.com<mailto:
> chisophugis at gmail.com>>>> wrote:
> >>>   Wow, this is a really detailed document. Great work!
> >>>
> >>>   I wouldn't typically recommend a document to go into this much
> detail, but I think that in this particular case, it is fine to have this
> detail since the document can double as a "in-depth walkthrough of a
> specific LLVM pass", which I'm sure will be useful for newbies to get a
> feel for things.
> >>>
> >>>   Actually, I have questions on this point before I get into reviewing
> the contents. This is the first piece of pass documentation. Who is the
> intended audience? What is the desired level of detail and why?
> >>>
> >>>   Hopefully this should get answered once Stepan an updates to answer
> the three questions:
> http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines
> >>>
> >>>   At what point should implementation details be found by reading the
> code instead of being in the documentation? Or is this supposed to be a
> higher-level understanding of the algorithm like an academic paper but
> without the tone (or impenetrable writing)? What is the burden for updating
> this document as the implementation changes and why is that a good tradeoff?
> >>>
> >>>   I really don't have a good answer to this. I sort of lean towards
> the "informal paper" interpretation. My gut right now is that this would be
> worth having as a hold-your-hand walkthrough for newbies, and would
> continue to be so even if details of the code changed underneath it. But I
> really don't have a good way to weight that against the downsides, like the
> ongoing maintenance commitment, if any. Any ideas are welcome.
> >>>
> >>>   -- Sean Silva
> >>>
> >>>   Nick
> >>>
> >>>   In your first section please answer the three questions here:
> http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines
> >>>
> >>>   I don't know that much about the pass (especially the new
> implementation), so Nick, could you skim over the content to make sure it
> is covering all the main bases?
> >>>
> >>>   Some random comments:
> >>>>   Sometimes code contains functions that does exactly the same thing
> even though
> >>>>   they are non-equal on the binary level.
> >>>   This confuses me; do you mean non-equal on the source level, but
> equal on the binary level?
> >>>>   If we will track every numbers and flags to be compared we would be
> able to get
> >>>>   numbers chain and then create the hash number. So, once again,
> *total-ordering*
> >>>>   could be considered as a milestone for even faster (in theory)
> random-access
> >>>>   approach.
> >>>   I'm not sure this makes sense. I imagine that part of the benefit of
> the comparison-based approach is that the comparisons can return early once
> they find a difference. Hashing always has to look at everything. Does the
> current comparison routine look at the entire function before actually
> doing any comparisons?
> >>>>   #. For two trees *T1* and *T2* we perform *depth-first-trace* and
> have two
> >>>>     chains as a product: "*T1Items*" and "*T2Items*".
> >>>   I think most readers would be more comfortable with the terms
> "depth-first-traversal" instead of "depth-first-trace" and "sequences"
> instead of "chains".
> >>>>   Consider modification of *cmpType* method.
> >>>   What does this paragraph mean?
> >>>
> >>>   -- Sean Silva
> >>>
> >>>   On Sun, Sep 14, 2014 at 11:02 PM, <llvm at dyatkovskiy.com<mailto:
> llvm at dyatkovskiy.com><mailto:llvm at dyatkovskiy.com<mailto:
> llvm at dyatkovskiy.com>><mailto:llvm at dyatkovskiy.com<mailto:
> llvm at dyatkovskiy.com><mailto:llvm at dyatkovskiy.com<mailto:
> llvm at dyatkovskiy.com>>>> wrote:
> >>>   ping
> >>>
> >>>   11.09.2014, 12:50, "Stepan Dyatkovskiy" <stpworld at narod.ru<mailto:
> stpworld at narod.ru><mailto:stpworld at narod.ru<mailto:stpworld at narod.ru
> >><mailto:stpworld at narod.ru<mailto:stpworld at narod.ru><mailto:
> stpworld at narod.ru<mailto:stpworld at narod.ru>>>>:
> >>>>   Reattached as patch.
> >>>>
> >>>>   Stepan Dyatkovskiy wrote:
> >>>>>   Hello everyone,
> >>>>>   Please review the MergeFunctions pass documentation in attachment.
> Hope
> >>>>>   doc is clear enough :-)
> >>>>>
> >>>>>   - Stepan
> >>>   _______________________________________________
> >>>   llvm-commits mailing list
> >>>   llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu><mailto:
> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>><mailto:
> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu><mailto:
> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>>>
> >>>   http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>>   <2014-10-03-mergefunc-doc.patch>
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141031/7a949e34/attachment.html>