[LLVMdev] Emscripten: LLVM => JavaScript

Mon Dec 19 06:21:06 PST 2011

On Fri, Dec 16, 2011 at 9:47 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Fri, Dec 16, 2011 at 7:14 PM, Alon Zakai <azakai at mozilla.com> wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Eli Friedman" <eli.friedman at gmail.com>
>>> To: "Alon Zakai" <azakai at mozilla.com>
>>> Cc: llvmdev at cs.uiuc.edu
>>> Sent: Thursday, December 15, 2011 7:02:34 PM
>>> Subject: Re: [LLVMdev] Emscripten: LLVM => JavaScript
>>> On Thu, Dec 15, 2011 at 4:10 PM, Alon Zakai <azakai at mozilla.com>
>>> wrote:
>>> > On that topic, I see there is an LLVM users page,
>>> >
>>> > http://llvm.org/Users.html
>>> >
>>> > - what is the procedure for suggesting adding a project to
>>> > there?
>>>
>>> Send a patch to llvm-commits.
>>
>> Thanks, I'll do that.
>>
>>>
>>> > The third issue I want to raise is regarding closer
>>> > integration with LLVM. Right now, Emscripten uses unmodified
>>> > LLVM and Clang, parsing their normal output. There are
>>> > however some reasons for integrating more closely, in
>>> > particular Emscripten has a problem when all LLVM
>>> > optimizations are run. This is not always important for
>>> > performance, as a safe subset exists, and we do our own
>>> > JS-level optimizations later which overlap somewhat. However,
>>> > it would be nice to be able to run all the LLVM optimizations.
>>> > The problems we have there are
>>> >
>>> > 1. i64s and doubles can be on 32-bit alignment, which is
>>> >   a problem for a JavaScript implementation with typed arrays
>>> >   with a shared buffer, since unaligned reads/writes there
>>> >   are impossible to do in a quick way. This can happen
>>> >   without optimizations, but is more common there due to
>>> >   the next point.
>>> >
>>> >   I've been told by Rafael Ávila de Espíndola that for this,
>>> >   I would need an Emscripten target in LLVM. Would that be
>>> >   upstreamable? (With or without Emscripten itself, preferably
>>> >   with?)
>>>
>>> Adding a Emscripten target to clang would be fine. Note that clang
>>> might generate unaligned loads anyway, but specifying an appropriate
>>> target will ensure it doesn't use such loads unless they are
>>> necessary.
>>
>> In what situation would unaligned loads be necessary? I was
>> hoping that unless the code literally did something crazy like
>> a load of an 8-byte value from a hardcoded 4-byte aligned
>> address (like 0x4), then otherwise "normal" C/C++ code would
>> always end up aligned. Is that correct?
>
> For normal unoptimized code, yes, everything should end up aligned.
> If you're compiling random C code, you're likely to run into code does
> "something crazy" (like using "__attribute__((packed))") occasionally,
> though.  Also, the optimizer will sometimes turn a memcpy into an
> unaligned load+store, or a pair of small loads into an unaligned load.
>
>>>
>>> > 2. Optimization sometimes generates types like i288, which
>>> >   Emscripten currently doesn't handle. From an optimizing
>>> >   perspective, it isn't yet clear if it would be faster to
>>> >   try to directly implement those, or to just break them up
>>> >   into more manageable native (32-bit) sizes. Note that even
>>> >   i64 is somewhat challenging to implement in a fast way
>>> >   on JavaScript, since that environment is really a 32-bit
>>> >   one, so it would be best to never do things like combine
>>> >   two 32-bit writes into one 64-bit write. It would be nice
>>> >   to have an option in LLVM to process the IR/bitcode back
>>> >   into having only target-native types, is that possible?
>>>
>>> All the LLVM targets which use the common code generation
>>> infrastructure have access to the legalizer, which handles that sort
>>> of thing. It would in theory be possible to write an equivalent that
>>> does most of that work on IR, but it's a substantial amount of work
>>> without any obvious benefit for existing targets.
>>>
>>
>> Ok, I guess that means I'll need to implement a legalizer. The
>> simplest thing would probably be for me to do it in Emscripten,
>> because the Emscripten IR is a simpler subset of LLVM IR (and
>> I'm already familiar with the codebase). But if it would be
>> useful for LLVM to have an IR pass that does legalization,
>> I'd consider doing it in LLVM. Thoughts?
>
> I don't think it would be very useful for the in-tree backends unless
> we make major changes to the way instruction selection works;
> legalization is closely integrated with other transformations.  That
> said, the question does come up periodically on llvmdev; if you are
> willing to write something, I'm sure some people would appreciate it.

Is the better solution to have an llvm codegen backend for llvm?

Andrew