<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><div><br></div></span><div>One of the standard reasons to prefer refactoring, even though it appears to take longer or be more difficult, is that it allows you to always keep all tests green. It is very easy for things to slip through the cracks and not promptly return to being green on a "from-scratch" version. This ultimately turns into bug reports later and the feature needs to be reimplemented; the apparent simplicity of the "from-scratch" version can disappear very rapidly.</div></div></div></div></blockquote><div><br></div><div>Hmm, why can't the from-scratch version use existing tests to make sure major features are not regressed?</div><div><br></div><div>Refactoring requires a good foundation. If the foundation is broken, rewriting is more preferred. There are many successful stories of complete rewrite.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>In the refactoring approach you are forced to incorporate a holistic understanding of the necessary features into your simplification efforts, since the tests keep you from accidentally disregarding necessary features.</div></div></div></div></blockquote><div> </div><div>Features are protected with good tests. This has nothing to do with the approach taken.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>It is very easy to accidentally buy simplicity at the cost of losing features; if you eventually need the features back then the apparent simplicity is an illusion.</div></div></div></div></blockquote><div><br></div><div>It is probably not quite useful debating in abstract. Rui already has the initial implementation ready which shows very promising results ...</div><div><br></div><div>just my 2c.</div><div><br></div><div>David</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="HOEnZb"><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>I understand what you are saying, because as you might have noticed, I'm probably the person who spent one's time most on refactoring it to do what you are saying. I wanted to make it more readable, easy to add features, and run faster. I worked actually really hard. Although I partly succeeded, I was disappointed to myself because of a (lack of) progress. After all, I had to conclude that that was not going to work --  they are so different that it's not reasonable to spend time on that direction. A better approach is to set a new foundation and move existing code to them, instead of doing rework in-place. It may also worth mentioning that the new approach worked well. I made up a self-hosting linker only in two weeks, which does support dead-stripping and is more than 4x faster.</div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><div><div><br></div><div>Besides them, I'd say from my experiences of working on the atom model, the new model's ability is not that different from the atom model. They are different, there are pros and cons, and I don't agree that the atom model is more flexible or conceptually better.</div></div></div></div></blockquote><div><br></div></span><div>I don't understand this focus on "the atom model". "the atom model" is not any particular thing. We can generalize the meaning of atom, we can make it more narrow, we can remove responsibilities from Atom, we can add responsibilities to Atom, we can do whatever is needed. As you yourself admit, the "new model" is not that different from "the atom model". Think of "the atom model" like SSA. LLVM IR is SSA; there is a very large amount of freedom to decide on the exact design within that scope. "the atom model" AFAICT just means that a core abstraction inside the linker is the notion of an indivisible chunk. Our current design might need to be changed, but starting from scratch only to arrive at the same basic idea but now having to effectively maintain two codebases doesn't seem worth it.</div></div></div></div></blockquote><div><br></div></span><div>Large part of the difficulties in development of the current LLD comes from over-generalizataion to share code between pretty much different file formats. My observation is that we ended up having to write large amount of code to share little core even which doesn't really fit well any platform (an example is the virtual archive file I mentioned above -- that was invented to hide platform-specific atom creation behind something platform-neutral stuff, and because archive files are supported by three platforms, they are chosen.) Different things are different, we need to get the right balance. I don't think that the current balance is not right.</div><div><div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>A lot of the issue here is that we are falsely distinguishing "section-based" and "atom-based". A suitable generalization of the notion of "indivisible chunks" and what you can do with them covers both cases, but traditional usage of sections makes the "indivisible chunks" be a lot larger (and loses more information in doing so). But as -ffunction-sections/-fdata-sections shows, there is not really any fundamental difference.</div><span><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 28, 2015 at 8:22 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Thu, May 28, 2015 at 6:25 PM, Nick Kledzik <span dir="ltr"><<a href="mailto:kledzik@apple.com" target="_blank">kledzik@apple.com</a>></span> wrote:<br></span><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><span><div>On May 28, 2015, at 5:42 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>> wrote:</div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I guess, looking back at Nick's comment:</div><div><br></div><div>"<span style="font-size:12.8000001907349px">The atom model is a good fit for the llvm compiler model for all architectures.  There is a one-to-one mapping between llvm::GlobalObject (e.g. function or global variable) and lld:DefinedAtom."</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">it seems that the primary issue on the ELF/COFF side is that currently the LLVM backends are taking a finer-grained atomicity that is present inside LLVM, and losing information by converting that to a coarser-grained atomicity that is the typical "section" in ELF/COFF.</span></div><div><span style="font-size:12.8000001907349px">But doesn't -ffunction-sections -fdata-sections already fix this, basically?</span></div><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">On the Mach-O side, the issue seems to be that Mach-O's notion of section carries more hard-coded meaning than e.g. ELF, so at the very least another layer of subdivision below what Mach-O calls "section" would be needed to preserve this information; currently symbols are used as a bit of a hack as this "sub-section" layer.</span></div></div></div></div></blockquote></span><div>I’m not sure what you mean here.</div><span><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">So the problem seems to be that the transport format between the compiler and linker varies by platform, and each one has a different way to represent things, some can't represent everything we want to do, apparently.</span></div></div></div></div></blockquote></span><div>Yes!</div><span><div><br></div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><span style="font-size:12.8000001907349px">BUT it sounds like at least relocatable ELF semantics can, in principle, represent everything that we can imagine an "atom-based file format"/"native format" to want to represent. </span><span style="font-size:12.8000001907349px">Just to play devil's advocate here, let's start out with the "native format" being relocatable ELF - on *all platforms*. Relocatable object files are just a transport format between compiler and linker, after all; who cares what we use? If the alternative is a completely new format, then bootstrapping from relocatable ELF is strictly less churn/tooling cost.</span></div><div><br></div><div><span style="font-size:12.8000001907349px">People on the "atom side of the fence", what do you think? Is there anything that we cannot achieve by saying "native"="relocatable ELF"?</span><span style="font-size:12.8000001907349px"><br></span></div></div></div></div></blockquote></span></div><div>1) Turns out .o files are written once but read many times by the linker.  Therefore, the design goal of .o files should be that they are as fast to read/parse in the linker as possible.  Slowing down the compiler to make a .o file that is faster for the linker to read is a good trade off.  This is the motivation for the native format - not that it is a universal format.</div></div></blockquote></span><div><br>I don't think that switching from ELF to something new can make linkers significantly faster. We need to handle ELF files carefully not to waste time on initial load, but if you do, reading data required for symbol resolution from ELF file should be satisfactory fast (I did that for COFF -- the current "atom-based ELF" linker is doing too much things in an initial load, like read all relocation tables, splitting indivisble chunk of data and connect them with "indivisible" edges, etc.) Looks like we read symbol table pretty quickly in the new implementation, and the bottleneck of it is now the time to insert symbols into the symbol hash table -- which you cannot make faster by changing object file format.</div><div><br></div><div>Speaking of the performance, if I want to make a significant difference, I'd focus on introducing new symbol resolution semantics. Especially, the Unix linker semantics is pretty bad for performance because we have to visit files one by one serially and possibly repeatedly. It's not only bad for parallelism but also for a single-thread case because it increase size of data to be processed. This is I believe the true bottleneck of Unix linkers. Tackling that problem seems to be most important to me, and "ELF as a file format is slow" is still an unproved thing to me.</div><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>2) I think the ELF camp still thinks that linkers are “dumb”.  That they just collate .o files into executable files.  The darwin linker does a lot of processing/optimizing the content (e.g. Objective-C optimizing, dead stripping, function/data re-ordering).  This is why atom level granularity is needed.</div></div></blockquote><div><br></div></span><div>I think that all these things are doable (and are being done) using -ffunction-sections.</div><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>For darwin, ELF based .o files is not interesting.  It won’t be faster, and it will take a bunch of effort to figure out how to encode all the mach-o info into ELF.  We’d rather wait for a new native format.</div></div></blockquote><div><br></div></span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><span><font color="#888888"><div><br></div><div>-Nick</div><div><br></div></font></span></div><br><span>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></span></blockquote></div><br></div></div>

</blockquote></div><br></div>

</div></div></blockquote></span></div><br></div></div>

</blockquote></div></div></div><br></div></div>

</blockquote></div></div></div><br></div></div>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></blockquote></div><br></div></div>