LLVM Documentation: MergeFunctions pass

Mon Sep 29 02:26:09 PDT 2014

Hi Nick and Silva.
Sorry again for such a latency.

In new version I have answered on three questions mentioned in
http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines

Mostly it answers on Nick’s questions as well. I would like to stop specially on next question:
> What is the burden for updating this document as the implementation changes and why is that a good tradeoff?
I tried to describe common cases. I quoted a little of comments and described functions implementation, but I tried to cut off places where we potentially could change logic, proposing reader to view the sources for more details. Anyways, if it happen to be, I’ll try to cut such extra details from documentation and replace it with more generic form.

This article is extension to source code and to comments we’ve added there. And it's been written on higher level than comments in source code.
(Frankly, I started it as a prove of total-ordering approach we used in MergeFunctions, but then just extended it and got full-featured article :-) )

Below are the answers quoted from article:

[quote]

1. Why would I want to read this document?
Document is the extension to pass comments and describes the pass logic. It describes algorithm that is used in order to compare functions, and contains the explanations of how we could then combine equal functions correctly, keeping module valid.
Material brought in top-down form, so reader could start learn pass from ideas and end up with low-level algorithm details, thus preparing him for reading the sources.
So main goal is do describe algorithm and logic here; the concept. This document is good for you, if you don’t want to read the source code, but want to understand pass algorithms. Author tried not to repeat the source-code and cover only common cases, and thus avoid cases when after minor code changes we need to update this document.

2. What should I know to be able to follow along with this document?
Reader should be familiar with common compile-engineering principles and LLVM code fundamentals. In this article we suppose reader is familiar with Single Static Assingment concepts. Understanding of IR structure is also important.
We will use such terms as “module”, “function”, “basic block”, “user”, “value”, “instruction”.
As a good start point, Kaleidoscope tutorial could be used (link).
Especially it’s important to understand chapter 3 of tutorial (link).
Reader also should know how passes work in LLVM, he could use next article as a reference and start point here (link).
What else? Well perhaps reader also should have some experience in LLVM pass debugging and bug-fixing.

3. What I gain by reading this document?
Main purpose is to provide reader with comfortable form of algorithms description, namely the human reading text. Since it could be hard to understand algorithm straight from the source code: pass uses some principles that have to be explained first.
Author wishes to everybody to avoid case, when you read code from top to bottom again and again, and yet you don’t understand why we implemented it that way.
We hope that after this article reader could easily debug and improve MergeFunctions pass and thus help LLVM project.

[/quote]

Thanks!
-Stepan

On 16 Sep 2014, at 05:16, Sean Silva <chisophugis at gmail.com> wrote:

On Mon, Sep 15, 2014 at 3:07 PM, Nick Lewycky <nlewycky at google.com<mailto:nlewycky at google.com>> wrote:
On 15 September 2014 15:02, Sean Silva <chisophugis at gmail.com<mailto:chisophugis at gmail.com>> wrote:
Wow, this is a really detailed document. Great work!

I wouldn't typically recommend a document to go into this much detail, but I think that in this particular case, it is fine to have this detail since the document can double as a "in-depth walkthrough of a specific LLVM pass", which I'm sure will be useful for newbies to get a feel for things.

Actually, I have questions on this point before I get into reviewing the contents. This is the first piece of pass documentation. Who is the intended audience? What is the desired level of detail and why?

Hopefully this should get answered once Stepan an updates to answer the three questions: http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines

At what point should implementation details be found by reading the code instead of being in the documentation? Or is this supposed to be a higher-level understanding of the algorithm like an academic paper but without the tone (or impenetrable writing)? What is the burden for updating this document as the implementation changes and why is that a good tradeoff?

I really don't have a good answer to this. I sort of lean towards the "informal paper" interpretation. My gut right now is that this would be worth having as a hold-your-hand walkthrough for newbies, and would continue to be so even if details of the code changed underneath it. But I really don't have a good way to weight that against the downsides, like the ongoing maintenance commitment, if any. Any ideas are welcome.

-- Sean Silva

Nick

In your first section please answer the three questions here: http://llvm.org/docs/SphinxQuickstartTemplate.html#guidelines

I don't know that much about the pass (especially the new implementation), so Nick, could you skim over the content to make sure it is covering all the main bases?

Some random comments:

> Sometimes code contains functions that does exactly the same thing even though
> they are non-equal on the binary level.

This confuses me; do you mean non-equal on the source level, but equal on the binary level?

> If we will track every numbers and flags to be compared we would be able to get
> numbers chain and then create the hash number. So, once again, *total-ordering*
> could be considered as a milestone for even faster (in theory) random-access
> approach.

I'm not sure this makes sense. I imagine that part of the benefit of the comparison-based approach is that the comparisons can return early once they find a difference. Hashing always has to look at everything. Does the current comparison routine look at the entire function before actually doing any comparisons?

> #. For two trees *T1* and *T2* we perform *depth-first-trace* and have two
>    chains as a product: "*T1Items*" and "*T2Items*".

I think most readers would be more comfortable with the terms "depth-first-traversal" instead of "depth-first-trace" and "sequences" instead of "chains".

> Consider modification of *cmpType* method.

What does this paragraph mean?

-- Sean Silva

On Sun, Sep 14, 2014 at 11:02 PM, <llvm at dyatkovskiy.com<mailto:llvm at dyatkovskiy.com>> wrote:
ping

11.09.2014, 12:50, "Stepan Dyatkovskiy" <stpworld at narod.ru<mailto:stpworld at narod.ru>>:
> Reattached as patch.
>
> Stepan Dyatkovskiy wrote:
>>  Hello everyone,
>>  Please review the MergeFunctions pass documentation in attachment. Hope
>>  doc is clear enough :-)
>>
>>  - Stepan
_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2014-09-29-mergefunc-doc.patch
Type: application/octet-stream
Size: 33301 bytes
Desc: 2014-09-29-mergefunc-doc.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140929/26ab7bdb/attachment.obj>