[LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference manual

Thu Aug 1 04:37:29 PDT 2013

I'm not familiar with PNaCl but, FWIW, I'd say the three main advancements
the CLR made over the JVM are:

.       Structs (aka value types).

.       Reified generics.
http://research.microsoft.com/pubs/64031/designandimplementationofgenerics.p
df

.       Tail call elimination.
http://research.microsoft.com/pubs/69132/babel01.pdf

Structs give you more freedom around memory representation (e.g. the CLR can
represent an array of pairs of floats and ints in a single contiguous block
of memory whereas the JVM cannot). The main practical benefit is performance
but also interoperability (e.g. SOA vs AOS with GPUs). In particular, as
structs are (usually) unboxed they do not incur allocations, introduce
indirections via pointers and do not stress the garbage collector. In fact,
structs can be used to completely avoid the garbage collector's contribution
to latency (Rapid Addition used this technique to eliminate GC latency from
their commercial FIX implementation
http://www.microsoft.com/en-us/download/details.aspx?id=20918). From an
implementation perspective, supporting structs makes it possible to get away
with a much less efficient garbage collector because programmers can simply
avoid the GC when necessary.

Reified generics combined with structs make it possible to write very
efficient generic collections (e.g. .NET Dictionary can be 17x faster than
Java's HashTable because it is stored as a single continuous block of memory
with no pointers http://fsharpnews.blogspot.co.uk/2010/05/java-vs-f.html).

Tail call elimination allows an unbounded number of recursive function calls
in tail position to consume finite stack space. This is essential for
traditional functional programming but has other practical applications
including representing state machines as mutually tail recursive functions
and optimizing async code to avoid trampolines. F# relies upon the tail
calls built into .NET since 2001. Note that tail call elimination applies
not only to static function calls but also dynamic calls and it is an
optimization in space and not time (on .NET general tail calls are ~2x
slower than non-tail calls).

Cheers,

Jon.

-----Original Message-----
From: Travis Cross [mailto:tc at travislists.com] 
Sent: 01 August 2013 07:55
To: Eli Bendersky
Cc: llvmdev at cs.uiuc.edu; Arnold Schwaighofer; Stephen Lin; Akira Hatanaka;
Evan Cheng; Chris Lattner; Dale Johannesen; Duncan Sands; Jeffrey Yasskin;
Jon Harrop; David Terei
Subject: Re: [LLVMdev] Tail calls (TCO) in PNaCL | PNaCl Bitcode reference
manual

On 2013-07-30 22:11, Eli Bendersky wrote:

> we've published an initial version of the PNaCl bitcode reference 

> manual online - 

>  <http://www.chromium.org/nativeclient/pnacl/bitcode-abi>
http://www.chromium.org/nativeclient/pnacl/bitcode-abi. The PNaCl 

> bitcode is a restricted subset of LLVM IR.

> 

> Any comments would be most welcome.

Hi Eli,

I appreciate you for opening the process for input and comments.  One
question stood out to me while reading the document:

The document [1] indicates that only the 'ccc' calling convention will be
supported.  The LLVM documentation [2] prominently notes that, "tail calls
can only be optimized when [fastcc], the GHC or the HiPE convention is
used."  Further, I notice that the document includes "call" but not "tail
call" in the list of supported instructions.

Do I understand correctly that this means that reliable tail call
optimization will not be possible under PNaCL as currently imagined?

That would be a real shame.  Languages such as Scheme, Haskell, Erlang, and
Lua require tail call optimization.  Working around the lack of TCO with
trampolines degrades performance, requires a major reworking of the compiler
front-end, and is ugly.  Such hacks really shouldn't be needed in 2013,
particularly when LLVM already went to the trouble of supporting TCO for
exactly these good reasons.

The JVM made this mistake in its byte code and many groups such as the
Clojure and Scala communities have been clamoring for years to get TCO into
the JVM.  ECMAScript has this error as well, but it's forgivable as
Javascript wasn't originally intended to be, as it is now, a compiler
target.

Indeed, I believe much of the enthusiasm for (P)NaCL stems from the hope
that we'll finally be able to compile arbitrary languages for the browser
without being unduly hampered or constrained.  Without TCO the excitement
here would be diminished.  Functional programming languages would be
kneecapped by the bytecode.

If my understanding above is correct, how can we get PNaCL to support a
sufficiently general calling convention to make TCO possible?

The "Stability of the PNaCl bitcode ABI" [3] document notes that the
translator will be restoring the fastcc convention (though issue #2346 notes
this may only happen after the first release).  Perhaps we could support the
"tail call" instruction and have the translator ensure an appropriate
calling convention is restored when that is seen?  Or perhaps we could
revisit our reluctance to support multiple calling conventions?  Or perhaps
we could address the issues with fastcc that are causing us to reject it, or
create a new calling convention that is simultaneously acceptable for our
needs and supports TCO?

I've CCed some individuals who might be interested in working out a solution
here.

Thanks,

-- TC

[1]  <http://www.chromium.org/nativeclient/pnacl/bitcode-abi>
http://www.chromium.org/nativeclient/pnacl/bitcode-abi

[2]  <http://llvm.org/releases/3.3/docs/LangRef.html#callingconv>
http://llvm.org/releases/3.3/docs/LangRef.html#callingconv

[3]
<https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/stability-of
-the-pnacl-bitcode-abi>
https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/stability-of-
the-pnacl-bitcode-abi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130801/018c9023/attachment.html>