[llvm-dev] [LLD] Adding WebAssembly support to lld

Wed Jul 12 16:36:03 PDT 2017

On Wed, Jul 12, 2017 at 3:23 PM, Rui Ueyama <ruiu at google.com> wrote:
> On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <sbc at chromium.org> wrote:
>>
>> On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > Sorry for the belated response. I was on vacation last week. A couple of
>> > thoughts on this patch and the story of webassembly linking.
>>
>> And I'm about to be on (mostly) vacation for next 3 weeks :)
>>
>> >
>> > - This patch adds a wasm support as yet another major architecture
>> > besides
>> > ELF and COFF. That is fine and actually aligned to the design principle
>> > of
>> > the current lld. Wasm is probably more different than ELF against COFF,
>> > and
>> > the reason why we separated COFF and ELF was because they are different
>> > enough that it is easier to handle them separately rather than writing a
>> > complex compatibility layer for the two. So that is I think the right
>> > design
>> > chocie. That being said, some files are unnecessarily copied to all
>> > targets.
>> > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged because
>> > they are mostly identical.
>>
>> I concur.  However, would you accept the wasm port landing first, and
>> then factoring some kind of library out of the 3 backends after that?
>>  Personally I would prefer to land the initial version without
>> touching the ELF/COFF backends and refactor in a second pass.
>
>
> Yes, we can do that later.
>
>> > - I can imagine that you would eventually want to support two modes of
>> > wasm
>> > object files. In one form, object files are represented in the compact
>> > format using LEB128 encoding, and the linker has to decode and re-encode
>> > LEB128 instruction streams. In the other form, they are still in LEB128
>> > but
>> > uses full 5 bytes for 4-byte numbers, so that you can just concatenate
>> > them
>> > without decoding/re-encoding. Which mode do you want to make default?
>> > The
>> > latter should be much faster than the former (or the former is probably
>> > unnecessarily slow), and because the regular compile-link-run cycle is
>> > very
>> > important for developers, I'd guess that making the latter default is a
>> > reasonable choice, although this patch implements the former. What do
>> > you
>> > think about it?
>>
>> Yes, currently relocatable wasm files (as produced by clang) use fixed
>> width LEB128 (padded to five bytes) for any relocation targets.  This
>> allows the linker to trivially apply relocations and blindly
>> concatenate data a code sections.  We specifically avoid any
>> instruction decoding in the linker.   The plan is to add a optional
>> pass over the generated code section of an executable file to compress
>> the relocation targets to their normal LEB128 size.  Whether or not to
>> make this the default is TBD.
>
>
> Does this strategy make sense?
>
>  - make compilers always emit fixed-width LEB128, so that linkers can link
> them just by concatenating them and applying relocations,
>  - make the linker emit fixed-width LEB128 by default as well, so that it
> can create executables as fast as it can just, and
>  - write an optional re-encoder which decodes and re-encodes fixed-width
> LEB128 to "compress" the final output.
>
> The third one can be an internal linker pass which is invoked when you pass
> -O1 or something to the linker, but conceptually it is separated from the
> "main" linker.

IIUC that is exactly the strategy I am suggesting.   Perhaps my
description of it was less clear.   The currently implement does this,
 with caveat that the final (optional) compression phase is not yet
implemented :)

>
> The rationale behind this strategy is that
>
> - Developers usually want to create outputs as fast as linkers can. Creating
> final executables for shipping is (probably by far) less frequent. I also
> expect that, if wasm will be successful, you'll be compiling and linking
> large programs using wasm as a target (on a successful platform, people
> start doing something incredible/crazy in general), so the toolchain
> performance will matter. You want to optimize it for regular
> compile-link-debug cycle.
> - Creating an output just by concatenating input file sections is I believe
> easier than decoding and re-encoding LEB128 fields. So I think we want to
> construct the linker based on that design, instead of directly emitting
> variable-size LEB128 fields.
>
>
>> > - Storing the length and a hash value for each symbol in the symbol
>> > table
>> > may speed up linking. We've learned that finding terminating NULs and
>> > computing hash values for symbols is time-consuming process in the
>> > linker.
>>
>> Yes, I imagine we could even share some of the core symbol table code
>> via the above mentioned library?
>>
>> >
>> >
>> >
>> > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Dan Gohman <sunfish at mozilla.com> writes:
>> >>
>> >> >> Sorry, I meant why that didn't work with ELF (or what else didn't).
>> >> >>
>> >> >
>> >> > The standard executable WebAssembly format does not use ELF, for
>> >> > numerous
>> >> > reasons, most visibly that ELF is designed for sparse decoding --
>> >> > headers
>> >> > contain offsets to arbitrary points in the file, while WebAssembly's
>> >> > format
>> >> > is designed for streaming decoding. Also, as Sam mentioned, there are
>> >> > a
>> >> > lot
>> >> > of conceptual differences. In ELF, virtual addresses are a pervasive
>> >> > organizing principle; in WebAssembly, it's possible to think about
>> >> > various
>> >> > index spaces as virtual address spaces, but not all
>> >> > address-space-oriented
>> >> > assumptions apply.
>> >>
>> >> I can see why you would want your own format for distribution. My
>> >> question was really about using ELF for the .o files.
>> >>
>> >> > It would also be possible for WebAssembly to use ELF ET_REL files
>> >> > just
>> >> > for
>> >> > linking, however telling LLVM and other tools to target ELF tends to
>> >> > lead
>> >> > them to assume that the final output is ELF and rely on ELF-specific
>> >> > features.
>> >>
>> >> Things like "the dynamic linker implements copy relocations"?
>> >>
>> >> Cheers,
>> >> Rafael
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>
>