[LLVMdev] Tail-calling

Mon Sep 8 17:24:00 PDT 2008

You can skip this first and second paragraph if you just want to get
to the description, the 'real' question I want is the bottom most
paragraph and may still make enough sense to read without reading
everything else, but there are a few other hidden questions and
opinions being asked in the rest of it.

In this little scripting language I am making is basically going to be
a C with Actors (like how C++ started out essentially as C with
Classes).  Just as C++ is not a 'real' OO style (little tid-bit, the
person who originally coined object-oriented programming was actually
talking about something like the Actor-oriented programming, but I am
using common terms here), but it is 'enough' to make things useful, I
am going to put in just enough Actor type stuff to make it useful,
while trying to retain as much speed as possible.  For the sake of the
Actor model I need to strip out a lot of standard C constructs (no
more mutable globals for example) so I am basically just keeping the C
syntax (I may still allow mutable globals, just may trust the coder to
'do-the-right-thing', and otherwise give plenty of warnings to scare
away the newbies).  The purpose of this little scripting language is
to do a lot of math processing (and other light-weight event handling
and such), but it needs to scale, not just to multiple cores, but
multiple computers.  I already have a back-end written in C++ for this
that I use for C++ apps (with a ton of scaffolding, the purpose of
this little language is to get rid of the ugly scaffolding and make it
into something pleasant to write, while allowing other users of my app
to code for it as well) as well as a design structure of how the
system will work, so the 'hard' part is pretty done, now it is time
for the tedious part.

On the level of the language, not the implementation, I plan on having
two types of functions in this app, one will function like normal C
function, return, arguments, etc., nothing special, the second type
will basically have no return value (when it returns it 'ends', will
be cleared up shortly, however tasklet calls later on the stack can
return values, but the 'main' tasklet function, the one that when it
returns the tasklet dies, it cannot have a return value).  There will
be a couple built-in functions in the language (most likely only 3,
the rest will be built from those inside the language itself, these 3
will hereafter be referred to as switching functions) that will be
capable of 'suspending' a (using Stackless Python terminology here)
tasklet and switching to something else.  A tasklet will be created by
an explicit call, passing in the function that will run the tasklet.

Internally normal functions will be as normal, nothing special.  The
tasklet style functions are unique, basically a function will be a
tasklet capable function if it has one of those switching functions
anywhere down in its calling queue.  Normal functions cannot call
tasklet functions, but tasklet function can call normal function.

A switch occurs when an explicit switching call is made, or the
bottom-most tasklet function on the tasklet stack returns.  When a
switch occurs, one of two things will happen depending on the what the
scheduler that it is assigned to wants to do, it will either (at the
llvm level) return, or it will tail-call another tasklet function,
which at the llvm level means calling the function of a tasklet that
is defined to be the continuation function.

As you can probably now see from the design, this relies on (proper)
tail-calls working... all the time.

When the code is compiled, normal functions are as normal, but tasklet
functions, if they call a normal function, will call it as normal,
registers, stack, everything.  However, if a tasklet function calls
another tasklet function, then at the LLVM level the function will be
split into a couple functions (with further split happening if there
are more tasklet function calls later on) where the function will
become a tail-call.

At this point, refer to the link
http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt
that I am using as a basis, although with some minor changes in passed
around data.

First difference, I am not using a garbage collector (not at this
moment anyway, and the design of the language may not require that I
need one at all actually, most things will be on the fake tasklet
stack, which is managed by the scheduler, the rest will mostly just be
messages, which will be managed internally by a different system, I
originally made those behind the scenes systems for C++).  The rest is
pretty similar, except that I have two styles in mind, one is much
different then what he put.

Since the tasklet functions cannot be called from a function pointer,
only compiled in directly, and since normal functions can still be
called any which way since they will always return before a switch
happens anyway (otherwise they would have been a tasklet function), I
will know exactly how much memory a whole tasklet call stack will need
down to its deepest point.  I also do not work toward recursion for
looping, it will still be C++ style for that, so tail-call
optimization inside the language itself is not necessary.

Because of this though, I have two 'obvious' ways to manage the memory
for a tasklet.  First, I can do it of the sabre method and allocate
(from the scheduler) the memory for each continuation individually as
they occur, seems overly costly and inefficient, especially since I
have all necessary information, however it is nice in that all
necessary memory is not allocated all at once, in other words, only
the memory being used will be allocated, no 'dead-weight'.  The second
method is to allocate all the memory a tasklet will need down to its
deepest tasklet call on its call stack (which in the generic case will
still not be that much, probably a few hundred bytes max in the
general case considering the system I will be using this on).  This
should be faster due to the lack of needing to get memory for each
tasklet function invocation, but it will have 'dead-weight' in the
parts of it that are unused, so potentially a tasklet stack could be
very light, except for one deep call that may need a few kb or few
hundred kb of space down in a function that may only be called once on
that callstack, so there would be a tremendous amount of unused memory
during the rest of the life of that tasklet.

However, I could combine the two method, either by doing it through
the compiler (heuristically guessing?), or perhaps an explicit
language syntax such as a qualifier for the function declaration (I
would think the full allocation one would be the norm,  but perhaps on
a deep function that may need lots of memory, and still switch, it
would be defined to only allocate that memory upon its creation), or
perhaps a slightly different way of calling the function (so instead
of it being function defined, it could be caller defined, would give
more control so it could be used better depending on the circumstance,
but that is generally more work for the coder, it is language design
and not really relevant here, but I am curious on thoughts
regardless).

The broken (non-first, the second and on) tasklet function signature
would probably just be something along the lines of:
schedulerRelevantEnum aTaskletFunctionName(topOfStackOrContinuationPtr
*ptr, someType ReturnValueFromFunctionBreak)
Basically it would follow sabre's document in this regard pretty
closely.  The top most call of a tasklet function (before splitting)
would not contain the return value (since something is calling it, nor
would other tasklet function calls that return nothing.  As stated
though, it is pretty identical to sabres method in the above link, use
it, and if the nameserver is down again use the Google cache at
http://google.com/search?q=cache:http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt&hl=en&rlz=1C1GGLS_enUS291&sa=G&strip=1

So, the main question, is this style now feasible on modern x86 (32
and 64 bit) based systems (both *nix and Windows based) good for LLVM?
 In other words, are tail-calls always working now when the function
call happens at the end, even if the function is being called through
a function pointer (are tail-calls on function pointers working
correctly right now, or is this whole project but for naught till
later)?