[llvm-dev] best way to represent function call with new stack in LLVM IR?

Thu May 10 19:28:36 PDT 2018

In the Zig frontend, we know at compile-time the entire call graph. This
means we know stack size for all functions and therefore the upper bound
stack usage.

Recursive calls can have a runtime-decided stack usage, and therefore I am
adding a frontend feature that will heap-allocate memory to use for some
function calls.

The idea is that recursion adds cycles to the call graph, and we know at
compile-time for a given cycle which of the function calls enters the
cycle. Therefore, we want to heap-allocate enough memory for the stack of 1
cycle, only in the function call that enters the cycle. So, that function
call must set a new stack pointer and then restore it after returning.

What is the best way to represent such a function call in LLVM IR?

Here's a way that will work, but keep reading for why this is not ideal:

; example address of top of new stack to use - assume the caller has
; figured out how much to allocate and has this address ready
@new_sp = internal unnamed_addr global i64 3735928559, align 8

; Function Attrs: nobuiltin nounwind
define void @entry() #2 {
Entry:
  %0 = call i64 asm "", "={sp}"()
  %1 = load i64, i64* @new_sp, align 8
  call void @cycleEntry(i64 %0, i64 %1)
  ret void
}

; Function Attrs: nobuiltin nounwind
define internal void @cycleEntry(i64, i64) unnamed_addr #2 {
Entry:
  call void asm sideeffect "   # set the new base pointer for this
function\0A   movq %rsi, %rbp\0A   # store stack pointer of this function
for later\0A   movq %rsp, (%rsi)\0A   # save this new stack pointer for use
later\0A   movq %rsp, r11\0A   # compute the new stack pointer for this
function\0A   subq %rdi, %rsi\0A   addq %rsp, %rsi  \0A   movq %rsi,
%rsp\0A   # copy args that were passed via the old stack to the new
stack\0A   # %r11 marches towards %rdi which is the source
addresses\0A1:\0A   cmpq %rdi, %r11\0A   je 2\0A   movq (%r11), %r12\0A
movq %r12, (%rsi)\0A   addq $$0x8, %rsi\0A   addq $$0x8, %r11\0A   jmp
1\0A2:", "~{r11},~{r12}"()

  ; function body goes here

  call void asm sideeffect " movq (%rbp), %rsp", ""()
  ret void
}

To make the assembly more readable, I'll include the source that generated
this:

var new_sp: usize = 0xdeadbeef;

export fn entry() void {
    const sp = asm ("" : [ret] "={sp}" (-> usize));
    cycleEntry(sp, new_sp);
}

extern fn cycleEntry(
    old_sp: usize,     // %rdi
    new_sp_arg: usize, // %rsi
) void {
    asm volatile (
        \\   # set the new base pointer for this function
        \\   movq %%rsi, %%rbp

        \\   # store stack pointer of this function for later
        \\   movq %%rsp, (%%rsi)

        \\   # save this new stack pointer for use later
        \\   movq %%rsp, r11

        \\   # compute the new stack pointer for this function
        \\   subq %%rdi, %%rsi
        \\   addq %%rsp, %%rsi
        \\   movq %%rsi, %%rsp

        \\   # copy args that were passed via the old stack to the new stack
        \\   # %%r11 marches towards %%rdi which is the source addresses
        \\1:
        \\   cmpq %%rdi, %%r11
        \\   je 2
        \\   movq (%%r11), %%r12
        \\   movq %%r12, (%%rsi)
        \\   addq $0x8, %%rsi
        \\   addq $0x8, %%r11
        \\   jmp 1
        \\2:
    ::: "r11", "r12");

    // function body

    asm volatile (
        \\ movq (%%rbp), %%rsp
    );
}

I haven't tested this yet, but already there are a lot of problems:

* It is not portable. This is an x86_64 implementation and it depends on
%rbp, because we're not sure how many bytes LLVM will subtract from %rsp in
the epilogue, and we must include the prologue/epilogue in order to know
how much to offset %rsp in the new stack.

* The side-effect assembly is hostile to the optimizer. This is unfortunate
because it is not fundamentally necessary. All in all this is just a
function call where the stack pointer is provided by the caller. If LLVM
understood this as a calling convention we could make this code friendly to
the optimizer.

* LLVM could generate a more efficient function call. This implementation
copies the stack frame of cycleEntry needlessly. We don't know which bytes
are allocated for parameters and which bytes are allocated for the call
frame. But LLVM would know this information if it was a calling convention.

So my questions for the mailing list are:

 1. Is there a way to accomplish this with existing LLVM API?
 2. If not, what would be the best way to support this use case, and would
such modifications to LLVM be welcome?

Regards,
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180510/74dad44e/attachment.html>