[LLVMdev] Emscripten: LLVM => JavaScript

Mon Dec 19 16:45:26 PST 2011

----- Original Message -----
> From: "Eli Friedman" <eli.friedman at gmail.com>
> To: "Alon Zakai" <azakai at mozilla.com>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Friday, December 16, 2011 7:47:00 PM
> Subject: Re: [LLVMdev] Emscripten: LLVM => JavaScript
> On Fri, Dec 16, 2011 at 7:14 PM, Alon Zakai <azakai at mozilla.com>
> wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Eli Friedman" <eli.friedman at gmail.com>
> >> To: "Alon Zakai" <azakai at mozilla.com>
> >> Cc: llvmdev at cs.uiuc.edu
> >> Sent: Thursday, December 15, 2011 7:02:34 PM
> >> Subject: Re: [LLVMdev] Emscripten: LLVM => JavaScript
> >>
> >> Adding a Emscripten target to clang would be fine. Note that clang
> >> might generate unaligned loads anyway, but specifying an
> >> appropriate
> >> target will ensure it doesn't use such loads unless they are
> >> necessary.
> >
> > In what situation would unaligned loads be necessary? I was
> > hoping that unless the code literally did something crazy like
> > a load of an 8-byte value from a hardcoded 4-byte aligned
> > address (like 0x4), then otherwise "normal" C/C++ code would
> > always end up aligned. Is that correct?
> 
> For normal unoptimized code, yes, everything should end up aligned.
> If you're compiling random C code, you're likely to run into code does
> "something crazy" (like using "__attribute__((packed))") occasionally,
> though. Also, the optimizer will sometimes turn a memcpy into an
> unaligned load+store, or a pair of small loads into an unaligned load.

Makes sense, thanks. I'll need to break those cases up
into unaligned loads/stores then.

> 
> >>
> >> > 2. Optimization sometimes generates types like i288, which
> >> >   Emscripten currently doesn't handle. From an optimizing
> >> >   perspective, it isn't yet clear if it would be faster to
> >> >   try to directly implement those, or to just break them up
> >> >   into more manageable native (32-bit) sizes. Note that even
> >> >   i64 is somewhat challenging to implement in a fast way
> >> >   on JavaScript, since that environment is really a 32-bit
> >> >   one, so it would be best to never do things like combine
> >> >   two 32-bit writes into one 64-bit write. It would be nice
> >> >   to have an option in LLVM to process the IR/bitcode back
> >> >   into having only target-native types, is that possible?
> >>
> >> All the LLVM targets which use the common code generation
> >> infrastructure have access to the legalizer, which handles that
> >> sort
> >> of thing. It would in theory be possible to write an equivalent
> >> that
> >> does most of that work on IR, but it's a substantial amount of work
> >> without any obvious benefit for existing targets.
> >>
> >
> > Ok, I guess that means I'll need to implement a legalizer. The
> > simplest thing would probably be for me to do it in Emscripten,
> > because the Emscripten IR is a simpler subset of LLVM IR (and
> > I'm already familiar with the codebase). But if it would be
> > useful for LLVM to have an IR pass that does legalization,
> > I'd consider doing it in LLVM. Thoughts?
> 
> I don't think it would be very useful for the in-tree backends unless
> we make major changes to the way instruction selection works;
> legalization is closely integrated with other transformations. That
> said, the question does come up periodically on llvmdev; if you are
> willing to write something, I'm sure some people would appreciate it.
> 

Given that, I think I'll start with an implementation
in JavaScript in Emscripten. After I get that mostly
working and have an idea of the scope of writing this
in a way that integrates into LLVM, I'll report back.

Best,
  Alon Zakai