<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019 at 4:48 PM Rui Ueyama <<a href="mailto:ruiu@google.com">ruiu@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Wed, Feb 27, 2019 at 4:46 PM Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019 at 4:41 PM Rui Ueyama <<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Wed, Feb 27, 2019 at 1:37 PM Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Feb 26, 2019 at 2:24 PM Rui Ueyama via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div><br></div><div>I've heard people say that they want to analyze dependencies between object files at the linker level so that they can run a whole-program analysis which cannot be done at the compiler that works for one compilation unit at a time. I'd like to start a discussion as to what we can do with it and how to make it possible. I'm also sharing my idea about how to make it possible.<div><br></div><div><b>Dependency analyses</b></div><div>First, let me start with a few examples of analyses I'm heard of or thinking about. Dependencies between object files can be represented as a graph where vertices are input sections and edges are symbols and relocations. Analyses would work on the dependency graph. Examples of analyses include but not limited to the following:<div><br></div><div> - Figure out why some library or an object file gets linked.</div><div><br></div><div> - Finding a candidate to eliminate dependency by finding a "weak" link to a library. We can for example say the dependency to a library is <i>weak</i> if the library in the graph can be unreachable if we remove <i>N</i> edges from the graph (which is likely to correspond to removing <i>N</i> function calls from the code), where <i>N</i> is a small number.</div><div><br></div><div> - Understanding which of new dependencies increase the executable size the most, compare to a previous build.</div></div><div><br></div><div> - Finding bad or circular dependencies between sub-components.</div><div><br></div><div>There would be many more analyses you want to run at the linker input level. Currently, lld doesn't actively support such analyses. There are a few options to make the linker emit dependency information (e.g. --cref or -Map), but the output of the options is not comprehensive; you cannot reconstruct a dependency graph from the output of the options.</div><div><br></div><div><b>Dumping dependency graph</b></div><div>So, I'm thinking if it would be desirable to add a new feature to the linker to dump an entire dependency graph in such a way that a graph can be reconstructed by reading it back. Once we have such feature, we can link a program with the feature enabled and run any kind of dependency analysis on the output. You can save dumps to compare to previous builds. You can run any number of analyses on a dump, instead of invoking the linker for each analysis.</div><div><br></div><div>I don't have a concrete idea about the file output format, but I believe it is essentially enough to emit triplets of (<from input section>, <symbol>, <to input section>), which represents an edge, to reconstruct a graph.</div></div></div></blockquote><div><br></div><div>A couple of points:</div><div><br></div><div>- Symbols are not the only kind of dependency edge from one section to another -- there's also SHF_LINK_ORDER. Maybe we can just call the edge "SHF_LINK_ORDER" in that case.<br></div><div>- Do we want to mark up the GC roots in some way? I imagine that we could do that with a synthetic node that represents the GC root, and then have all roots include edges from it. With my proposal for partitioning, perhaps we could have one GC root node per partition.</div></div></div></blockquote><div><br></div><div>I think we should mark up the GC root in some way. One thing to note is that not only input sections but also symbols can be GC root. In terms of the graph, both edge and vertex should have a "GC root" bit.</div></div></div></blockquote><div><br></div><div>You can represent both with a synthetic GC root vertex, though? e.g.</div><div><br></div><div>GC root --[ExportedFunction]--> .text.ExportedFunction</div><div>GC root --[.init_array]--> .init_array.InitializedGlobal</div></div></div></blockquote><div><br></div><div>I think that should work, but I'm not sure if this is easier to handle than adding a bit to each vertex/edge.</div></div></div></blockquote><div><br></div><div>It seems like it would make graph reachability queries a little easier to express. Like if you wanted to find out why section "foo" is in your program, you could ask "what's the path from 'GC root' to 'foo'". But with a bit on vertices/edges you need to ask "what's the path from some vertex/edge with the 'GC root' bit set to 'foo'". I don't have much actual experience with graph query languages, though.</div><div><br></div><div>Peter</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Peter</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Peter</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div><br></div><div>Thoughts?</div></div></div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-3958887823817614733gmail-m_4001498160825100312gmail-m_-3346619741596098730gmail-m_-8236266145466599499m_-2285479628105995127gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div></div>

</blockquote></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail-m_-3958887823817614733gmail-m_4001498160825100312gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div></div>

</blockquote></div></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div></div>