[llvm-commits] [PATCH] Remove tail marker when changing an argument to an alloca.

Fri Jan 4 12:27:54 PST 2013

>> Can be turned into an infinite loop by an IL pass (but no pass in
>> -std-compile-opts does it right now).
>
>
> well, it's not going to be turned into an infinite loop because it's not
> recursive!  However consider something like:
>
> %X = type { i32 }
> define void @g(%X* byval %a, i32 %level) {
> entry:
>   %as_int = bitcast %X* %a to i32*
>   store i32 %level, i32* %as_int
>   %new_level = add i32 %level, 1
>   tail call void @g(%X* byval %a, i32 %new_level)
>   ret void
> }
>
> Each store is to a new piece of memory, however if you turn this into a loop
> then each store is to the same piece of memory (unless you introduce an
> alloca
> representing the byval memory inside the loop).  That said, it's not clear
> to
> me that that's actually a problem.  The reason is: in order for this to
> matter,
> someone needs to read from the memory after the call returns, and notice
> that
> the callee changed it (due to the 'byval' callers aren't supposed to see
> writes
> to the memory that the callee made).  However our tail recursion optimizer
> currently won't rewrite into a loop if there is anything non-trivial after
> the
> tail call.  In particular there can't be a load or anything similar that
> could
> notice that the %a memory was changed by the call.
>
> Maybe this is a general argument: if there is a tail call to a function with
> a byval parameter, and the call is in tail position (i.e. at the end with
> nothing substantial after it) then you don't need to bother making a copy of
> the byval argument.
>
> (I'm a bit worried about examples where the tail callee returns the argument
> somehow, enabling somewhere else to peek inside it and observe the lack of a
> byval copy, but I didn't come up with an example yet in which the peeking
> didn't involve undefined behaviour).
>
> OK, I think I've convinced myself that allowing tail call on something with
> a byval argument can be useful, and that the tail recursion optimization
> will work correctly with it.

Hi Duncan,

I guess I started going a bit off topic. The test I posted is not
recursive as written, but can be optimized to one, but that is not the
big issue. The issue are the cases where we don't have the full body
to analyze and we are left with what guarantees the IL gives us.

I think that an important case is:

declare void @f(i32*)
define void @g(i32* byval %a) {
  tail call void @f(i32* %a)
  ret void
}

Given the current rules, the above code is valid and, on x86-64 with
the default calling convention, llc can produce:

	leaq	8(%rsp), %rdi
	jmp	f

Note that this is valid even if f can write to its argument. This is
so because g knows that it has its own copy of %a.

In a different ABI, this might not be possible. If for example g is
the one responsible for poping the storage used for %a, then it has to
do a regular call to f. On on the normal x86 abi, g has to align the
stack, and llc produces

	subl	$12, %esp
	leal	16(%esp), %eax
	movl	%eax, (%esp)
	calll	f
	addl	$12, %esp
	ret

Also note that without knowing that f will not write to its argument,
we cannot optimize g to

define void @g(i32* %a) {
  tail call void @f(i32* %a)
  ret void
}
declare void @f(i32*)

even if the ABI for g in both cases would match. In the new code g
doesn't have its own copy of %a.

Since the byval %a is allocated (and freed) in g's caller in most
ABIs, I think this is a good argument for why we should not count
byval as an alloca for the purposes of allowing the tail marker or
not.

In a way, what is happening in this optimization is that the function
being tail called goes (in most ABIs) from accessing an alloca in the
caller's caller to accessing an alloca in the caller, and therefore we
have to remove the tail marker.

While I acknowledge that this maintenance is an annoyance, it is not
specific to this pass or even to byval. Consider a slightly modified
example:

declare  void @f(i32*)
define void @g(i32* %a) {
  tail call void @f(i32* %a)
  ret void
}
define void @h(i32 %x) {
  %b = alloca i32
  store i32 %x, i32* %b
  call void @g(i32* %b)
  ret void
}

There is no byval in it and the tail marker is correct since f gets an
alloca in h, not in g. When inlining, the inliner correctly drops it
and h becomes:

define void @h(i32 %x) {
  %b = alloca i32
  store i32 %x, i32* %b
  call void @f(i32* %b)
  ret void
}

In general, any pass that changes the arguments to a function has to
be careful. If it now passes an local alloca as an argument, it has to
drop the tail marker.

> Ciao, Duncan.
>

Cheers,
Rafael