[LLVMdev] Optimization feasibility

Tue Dec 25 09:07:55 PST 2007

On 25 Dec 2007, at 03:29, Gordon Henriksen wrote:

> Hi Jo,
>
> On 2007-12-24, at 14:43, Joachim Durchholz wrote:
>
>> I'm in a very preliminary phase of a language project which requires
>> some specific optimizations to be reasonably efficient.
>>
>> LLVM already looks very good; I'd just like to know whether I can
>> push these optimizations through LLVM to the JIT phase (which, as
>> far as I understand the docs, is a pretty powerful part of LLVM).
>
> Cool.
>
>> The optimizations that I need to get to work are:
>>
>> * Tail call elimination.

> It also supports emitting tail calls on
> x86, but its support is somewhat weak. This is partially mandated by
> calling conventions, but those implementing functional languages might
> be disappointed. Check the llvmdev archives for details.
>
Hi Joachim,
I am  the person to blame for tail call support and its deficiencies  
on x86.

The current constraints for tail calls on x86 are:

Max 2 registers are used for argument passing (inreg). Tail call  
optimization is performed
provided:
//                * option tailcallopt is enabled
//                * caller/callee are fastcc
//                * elf/pic is disabled (this should be the case on  
mac os x?) OR
//                * elf/pic enabled + callee is in the same module as  
caller + callee has
//                  visibility protected or hidden

an (pointless) example would be:

<<---tailcall.ll --->>
@.str = internal constant [12 x i8] c"result: %d\0A\00"		; <[12 x i8] 
*> [#uses=1]

define fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
entry:
         ret i32 %a3
}

define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
entry:
         %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 % 
in2, i32 %in1,
  i32 %in2 )             ; <i32> [#uses=1]
         ret i32 %tmp11
}

define i32 @main(i32 %argc, i8** %argv) {
entry:
	%argc_addr = alloca i32		; <i32*> [#uses=1]
	%argv_addr = alloca i8**		; <i8***> [#uses=1]
	%retval = alloca i32, align 4		; <i32*> [#uses=2]
	%tmp = alloca i32, align 4		; <i32*> [#uses=2]
	%res = alloca i32, align 4		; <i32*> [#uses=2]
	"alloca point" = bitcast i32 0 to i32		; <i32> [#uses=0]
	store i32 %argc, i32* %argc_addr
	store i8** %argv, i8*** %argv_addr
	%tmp1 = call fastcc i32 @tailcaller( i32 1, i32 2)		; <i32> [#uses=1]
	store i32 %tmp1, i32* %res
	%tmp2 = getelementptr [12 x i8]* @.str, i32 0, i32 0		; <i8*> [#uses=1]
	%tmp3 = load i32* %res		; <i32> [#uses=1]
	%tmp4 = call i32 (i8*, ...)* @printf( i8* %tmp2, i32 %tmp3 )		;  
<i32> [#uses=0]
	store i32 0, i32* %tmp
	%tmp5 = load i32* %tmp		; <i32> [#uses=1]
	store i32 %tmp5, i32* %retval
	br label %return

return:		; preds = %entry
	%retval6 = load i32* %retval		; <i32> [#uses=1]
	ret i32 %retval6
}

declare i32 @printf(i8*, ...)
<<---tailcall.ll --->>

x86Shell:>  llvm-as < tailcall.ll | llc  -tailcallopt | gcc -x  
assembler -
x86Shell:> ./a.out

if you have got any questions regarding tail call stuff  i would be  
happy to help

regards arnold