[llvm-dev] [LLD] Adding WebAssembly support to lld

Mon Jul 17 15:24:14 PDT 2017

For the record, I left review comments to https://reviews.llvm.org/D34851.

On Wed, Jul 12, 2017 at 4:36 PM, Sam Clegg <sbc at google.com> wrote:

> On Wed, Jul 12, 2017 at 3:23 PM, Rui Ueyama <ruiu at google.com> wrote:
> > On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <sbc at chromium.org> wrote:
> >>
> >> On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> > Sorry for the belated response. I was on vacation last week. A couple
> of
> >> > thoughts on this patch and the story of webassembly linking.
> >>
> >> And I'm about to be on (mostly) vacation for next 3 weeks :)
> >>
> >> >
> >> > - This patch adds a wasm support as yet another major architecture
> >> > besides
> >> > ELF and COFF. That is fine and actually aligned to the design
> principle
> >> > of
> >> > the current lld. Wasm is probably more different than ELF against
> COFF,
> >> > and
> >> > the reason why we separated COFF and ELF was because they are
> different
> >> > enough that it is easier to handle them separately rather than
> writing a
> >> > complex compatibility layer for the two. So that is I think the right
> >> > design
> >> > chocie. That being said, some files are unnecessarily copied to all
> >> > targets.
> >> > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged
> because
> >> > they are mostly identical.
> >>
> >> I concur.  However, would you accept the wasm port landing first, and
> >> then factoring some kind of library out of the 3 backends after that?
> >>  Personally I would prefer to land the initial version without
> >> touching the ELF/COFF backends and refactor in a second pass.
> >
> >
> > Yes, we can do that later.
> >
> >> > - I can imagine that you would eventually want to support two modes of
> >> > wasm
> >> > object files. In one form, object files are represented in the compact
> >> > format using LEB128 encoding, and the linker has to decode and
> re-encode
> >> > LEB128 instruction streams. In the other form, they are still in
> LEB128
> >> > but
> >> > uses full 5 bytes for 4-byte numbers, so that you can just concatenate
> >> > them
> >> > without decoding/re-encoding. Which mode do you want to make default?
> >> > The
> >> > latter should be much faster than the former (or the former is
> probably
> >> > unnecessarily slow), and because the regular compile-link-run cycle is
> >> > very
> >> > important for developers, I'd guess that making the latter default is
> a
> >> > reasonable choice, although this patch implements the former. What do
> >> > you
> >> > think about it?
> >>
> >> Yes, currently relocatable wasm files (as produced by clang) use fixed
> >> width LEB128 (padded to five bytes) for any relocation targets.  This
> >> allows the linker to trivially apply relocations and blindly
> >> concatenate data a code sections.  We specifically avoid any
> >> instruction decoding in the linker.   The plan is to add a optional
> >> pass over the generated code section of an executable file to compress
> >> the relocation targets to their normal LEB128 size.  Whether or not to
> >> make this the default is TBD.
> >
> >
> > Does this strategy make sense?
> >
> >  - make compilers always emit fixed-width LEB128, so that linkers can
> link
> > them just by concatenating them and applying relocations,
> >  - make the linker emit fixed-width LEB128 by default as well, so that it
> > can create executables as fast as it can just, and
> >  - write an optional re-encoder which decodes and re-encodes fixed-width
> > LEB128 to "compress" the final output.
> >
> > The third one can be an internal linker pass which is invoked when you
> pass
> > -O1 or something to the linker, but conceptually it is separated from the
> > "main" linker.
>
> IIUC that is exactly the strategy I am suggesting.   Perhaps my
> description of it was less clear.   The currently implement does this,
>  with caveat that the final (optional) compression phase is not yet
> implemented :)
>
> >
> > The rationale behind this strategy is that
> >
> > - Developers usually want to create outputs as fast as linkers can.
> Creating
> > final executables for shipping is (probably by far) less frequent. I also
> > expect that, if wasm will be successful, you'll be compiling and linking
> > large programs using wasm as a target (on a successful platform, people
> > start doing something incredible/crazy in general), so the toolchain
> > performance will matter. You want to optimize it for regular
> > compile-link-debug cycle.
> > - Creating an output just by concatenating input file sections is I
> believe
> > easier than decoding and re-encoding LEB128 fields. So I think we want to
> > construct the linker based on that design, instead of directly emitting
> > variable-size LEB128 fields.
> >
> >
> >> > - Storing the length and a hash value for each symbol in the symbol
> >> > table
> >> > may speed up linking. We've learned that finding terminating NULs and
> >> > computing hash values for symbols is time-consuming process in the
> >> > linker.
> >>
> >> Yes, I imagine we could even share some of the core symbol table code
> >> via the above mentioned library?
> >>
> >> >
> >> >
> >> >
> >> > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev
> >> > <llvm-dev at lists.llvm.org> wrote:
> >> >>
> >> >> Dan Gohman <sunfish at mozilla.com> writes:
> >> >>
> >> >> >> Sorry, I meant why that didn't work with ELF (or what else
> didn't).
> >> >> >>
> >> >> >
> >> >> > The standard executable WebAssembly format does not use ELF, for
> >> >> > numerous
> >> >> > reasons, most visibly that ELF is designed for sparse decoding --
> >> >> > headers
> >> >> > contain offsets to arbitrary points in the file, while
> WebAssembly's
> >> >> > format
> >> >> > is designed for streaming decoding. Also, as Sam mentioned, there
> are
> >> >> > a
> >> >> > lot
> >> >> > of conceptual differences. In ELF, virtual addresses are a
> pervasive
> >> >> > organizing principle; in WebAssembly, it's possible to think about
> >> >> > various
> >> >> > index spaces as virtual address spaces, but not all
> >> >> > address-space-oriented
> >> >> > assumptions apply.
> >> >>
> >> >> I can see why you would want your own format for distribution. My
> >> >> question was really about using ELF for the .o files.
> >> >>
> >> >> > It would also be possible for WebAssembly to use ELF ET_REL files
> >> >> > just
> >> >> > for
> >> >> > linking, however telling LLVM and other tools to target ELF tends
> to
> >> >> > lead
> >> >> > them to assume that the final output is ELF and rely on
> ELF-specific
> >> >> > features.
> >> >>
> >> >> Things like "the dynamic linker implements copy relocations"?
> >> >>
> >> >> Cheers,
> >> >> Rafael
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> llvm-dev at lists.llvm.org
> >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/f74815b1/attachment-0001.html>