[LLVMbugs] [Bug 962] NEW: Code generator compiles back-to-back fixed-size dynamic allocas into really bad code

Sun Oct 22 11:20:59 PDT 2006

http://llvm.org/bugs/show_bug.cgi?id=962

           Summary: Code generator compiles back-to-back fixed-size dynamic
                    allocas into really bad code
           Product: libraries
           Version: 1.8
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Common Code Generator Code
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: sabre at nondot.org

The tail recursion elimination pass can produce functions with allocas not in the entry block, if it wants 
them to live across iterations of the tail-recursion-eliminated code.  A simple example of the produced 
code is:

void %foo(bool %c) {
        br label %Next
Next:
        %A = alloca int
        %B = alloca int
        %C = alloca int
        %D = alloca int
        store int 1, int* %A
        store int 1, int* %B
        store int 1, int* %C
        store int 1, int* %D
        call void %test(int* %A, int* %B, int* %C, int* %D)
        br bool %c, label %Next, label %Out
Out:
        ret void
}
declare void %test(int*,int*,int*,int*)

It does this on the assumption that a block of allocas will be compiled into marginally efficient code, on 
the order of performance as a normal prolog.  However, this isn't the case.  The code above compiles 
into:

LBB1_1: ;Next
        mr r2, r1
        addi r3, r2, -16
        mr r1, r3
        mr r7, r1
        addi r4, r7, -16
        mr r1, r4
        mr r8, r1
        addi r5, r8, -16
        mr r1, r5
        mr r9, r1
        addi r6, r9, -16
        mr r1, r6
        li r10, 1
        stw r10, -16(r2)
        stw r10, -16(r7)
        stw r10, -16(r8)
        stw r10, -16(r9)
        rlwinm r29, r30, 0, 31, 31
        addi r1, r1, -64
        bl L_test$stub
        cmplwi cr0, r29, 0
        addi r1, r1, 64
        bne cr0, LBB1_1 ;Next

At the top, that is a ton of back-to-back copies into the stack pointer and out.  On X86, doing this sort 
of thing quickly causes spill code to be generated.

It would be far better to see these fixed size dynamic allocas and merge them together in the code 
generator, doing a single stack adjustment and then treating the suballocas as offsets from the base 
pointer.

-Chris

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.