[LLVMdev] RFC: GSoC Project

Wed Mar 23 05:10:54 PDT 2011

On Wed, Mar 23, 2011 at 03:37:02PM +0530, Sanjoy Das wrote:
> I intend to start with the simplest possible approach - representing the 
> stack as a doubly linked list of _block_s, the size of each _block_ 
> being a power of two. This can later be modified to improve performance 
> and accommodate other factors. Blocks will be chained together into a 
> doubly linked list structure (using the first two words in the block as 
> the next and previous pointers).

Where do you plan to store the pointers? What changes to the runtime
environment does this require?

> In the prologue, a function will check whether the current block has 
> enough stack space. This is easily done for function which don't have 
> variable sized allocas, and for ones which do, we can assume some 
> worst-case upper bound.

If there are allocas involved, it is quite likely they are inside loops
etc., in which case there are no simple stack boundaries. This shouldn't
be a problem if the space for all static variables was allocated at the
beginning.

> The prologue can then call an intrinsic (let's 
> call it llvm.adjust_stack) which allocates a new block (possibly by 
> delegating this to a user-provided callback), copies the arguments, 
> saves the previous stack pointer (in the new block), and adjusts the 
> next and previous pointers.

Why do you need to copy the arguments? In fact, why do you think you can
actually copy the arguments? Consider printf for this.

> It will also have to adjust the stack pointer, and the frame pointer,
> if it is being maintained. Cleanup can be done by hijacking the return
> value, as also mentioned in [1]. It might make sense to leave the
> allocated blocks around, to prevent re-allocating the next time the
> program needs more stack space.

Hijacking the return value is a nice trick. I'm not sure about freeing /
not-freeing the unused block, this again has implications for the
runtime environment.

Have you considered how much work call graph optimisations require?
Especially with LTO it should be possible to kill many of the checks by
computing used stack space across segments of the call graph. One of the
papers on service scalability discussed this for light weight threads.
Forgot which one, it's been a while.

> One thing I'd really like some input on is whether implementing split 
> stacks would be useful enough to warrant the effort (especially keeping 
> in mind that this is pretty useless on 64 bit architectures).

I don't think it is useless on 64bit architectures. You can't always
make arbitrary large reservations of address space (e.g. optimised output of
Chicken). They also don't come without a price. Also consider something
like a kernel environment, where you really want to have a minimal
default stack, but be able to fallback gracefully.

Joerg