<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <span dir="ltr"><<a href="mailto:sbc@chromium.org" target="_blank">sbc@chromium.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev<br>

<<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

> Sorry for the belated response. I was on vacation last week. A couple of<br>

> thoughts on this patch and the story of webassembly linking.<br>

<br>

</span>And I'm about to be on (mostly) vacation for next 3 weeks :)<br>

<span class=""><br>

><br>

> - This patch adds a wasm support as yet another major architecture besides<br>

> ELF and COFF. That is fine and actually aligned to the design principle of<br>

> the current lld. Wasm is probably more different than ELF against COFF, and<br>

> the reason why we separated COFF and ELF was because they are different<br>

> enough that it is easier to handle them separately rather than writing a<br>

> complex compatibility layer for the two. So that is I think the right design<br>

> chocie. That being said, some files are unnecessarily copied to all targets.<br>

> Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged because<br>

> they are mostly identical.<br>

<br>

</span>I concur.  However, would you accept the wasm port landing first, and<br>

then factoring some kind of library out of the 3 backends after that?<br>

 Personally I would prefer to land the initial version without<br>

touching the ELF/COFF backends and refactor in a second pass.</blockquote><div><br></div><div>Yes, we can do that later.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

> - I can imagine that you would eventually want to support two modes of wasm<br>

> object files. In one form, object files are represented in the compact<br>

> format using LEB128 encoding, and the linker has to decode and re-encode<br>

> LEB128 instruction streams. In the other form, they are still in LEB128 but<br>

> uses full 5 bytes for 4-byte numbers, so that you can just concatenate them<br>

> without decoding/re-encoding. Which mode do you want to make default? The<br>

> latter should be much faster than the former (or the former is probably<br>

> unnecessarily slow), and because the regular compile-link-run cycle is very<br>

> important for developers, I'd guess that making the latter default is a<br>

> reasonable choice, although this patch implements the former. What do you<br>

> think about it?<br>

<br>

</span>Yes, currently relocatable wasm files (as produced by clang) use fixed<br>

width LEB128 (padded to five bytes) for any relocation targets.  This<br>

allows the linker to trivially apply relocations and blindly<br>

concatenate data a code sections.  We specifically avoid any<br>

instruction decoding in the linker.   The plan is to add a optional<br>

pass over the generated code section of an executable file to compress<br>

the relocation targets to their normal LEB128 size.  Whether or not to<br>

make this the default is TBD.</blockquote><div><br></div><div>Does this strategy make sense?</div><div><br></div><div> - make compilers always emit fixed-width LEB128, so that linkers can link them just by concatenating them and applying relocations,</div><div> - make the linker emit fixed-width LEB128 by default as well, so that it can create executables as fast as it can just, and</div><div> - write an optional re-encoder which decodes and re-encodes fixed-width LEB128 to "compress" the final output.</div><div><br></div><div>The third one can be an internal linker pass which is invoked when you pass -O1 or something to the linker, but conceptually it is separated from the "main" linker.</div><div><br></div><div>The rationale behind this strategy is that</div><div><br></div><div>- Developers usually want to create outputs as fast as linkers can. Creating final executables for shipping is (probably by far) less frequent. I also expect that, if wasm will be successful, you'll be compiling and linking large programs using wasm as a target (on a successful platform, people start doing something incredible/crazy in general), so the toolchain performance will matter. You want to optimize it for regular compile-link-debug cycle.</div><div>- Creating an output just by concatenating input file sections is I believe easier than decoding and re-encoding LEB128 fields. So I think we want to construct the linker based on that design, instead of directly emitting variable-size LEB128 fields.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

> - Storing the length and a hash value for each symbol in the symbol table<br>

> may speed up linking. We've learned that finding terminating NULs and<br>

> computing hash values for symbols is time-consuming process in the linker.<br>

<br>

</span>Yes, I imagine we could even share some of the core symbol table code<br>

via the above mentioned library?<br>

<div class="HOEnZb"><div class="h5"><br>

><br>

><br>

><br>

> On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev<br>

> <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

>><br>

>> Dan Gohman <<a href="mailto:sunfish@mozilla.com">sunfish@mozilla.com</a>> writes:<br>

>><br>

>> >> Sorry, I meant why that didn't work with ELF (or what else didn't).<br>

>> >><br>

>> ><br>

>> > The standard executable WebAssembly format does not use ELF, for<br>

>> > numerous<br>

>> > reasons, most visibly that ELF is designed for sparse decoding --<br>

>> > headers<br>

>> > contain offsets to arbitrary points in the file, while WebAssembly's<br>

>> > format<br>

>> > is designed for streaming decoding. Also, as Sam mentioned, there are a<br>

>> > lot<br>

>> > of conceptual differences. In ELF, virtual addresses are a pervasive<br>

>> > organizing principle; in WebAssembly, it's possible to think about<br>

>> > various<br>

>> > index spaces as virtual address spaces, but not all<br>

>> > address-space-oriented<br>

>> > assumptions apply.<br>

>><br>

>> I can see why you would want your own format for distribution. My<br>

>> question was really about using ELF for the .o files.<br>

>><br>

>> > It would also be possible for WebAssembly to use ELF ET_REL files just<br>

>> > for<br>

>> > linking, however telling LLVM and other tools to target ELF tends to<br>

>> > lead<br>

>> > them to assume that the final output is ELF and rely on ELF-specific<br>

>> > features.<br>

>><br>

>> Things like "the dynamic linker implements copy relocations"?<br>

>><br>

>> Cheers,<br>

>> Rafael<br>

>> ______________________________<wbr>_________________<br>

>> LLVM Developers mailing list<br>

>> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

><br>

><br>

><br>

> ______________________________<wbr>_________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

><br>

</div></div></blockquote></div><br></div></div>