[PATCH] Teach DeadArgElimination not to eliminate return values of functions with 'returned' arguments

Tue Jun 25 16:10:36 PDT 2013

> Here's an attempt at an example to make this more concrete, though I'm not
> completely sure we're talking about the same case:
>
> define i8* @foo(i8* returned %a) {
>   call void @do_stuff_that_does_not_use_a();
>   ret i8* %a
> }

Right, the latest patch does not preserve the return value in this
case. It only preserves the return value if %a is marked 'returned'
and is used by something other than the return.

>
> ...
> %x = expensive_thing
> %t = call i8* @foo(i8* %x)
> ... (further uses of %x that may eventually be discovered to be dead)
>
> If %x is ultimately dead, how do we get to the point of removing it? It's
> not trivially dead because it has a use. If you teach dead-arg elimination
> to keep it live, then it won't die even if its other uses are ultimately
> removed.
>
>>> It'd be more natural to have the function return void, and treat it
>>> specially in codegen. I propose the following alternative.
>>>
>>> Keep the 'returned' argument attribute, and add a 'returnsarg' function
>>> attribute. Then, the semantics of 'returned' are that it is ignored unless
>>> it is the first 'returned' argument in a function that is marked
>>> 'returnsarg' and has a void return type. Then at the codegen level, the
>>> function is translated as if it had the return type of its first 'returned'
>>> argument. As an optimization, codegen can rewrite uses after the call to use
>>> the return value.
>>>
>>> That way, DeadArgElimination and all the other mid-level optimizations
>>> can run at full strength, and you can still get your optimization. What do
>>> you think?
>>>
>>> Dan
>>>
>>
>>
>> But the callee actually has to return that argument somehow for the
>> attribute to be valid. Are you proposing that the functions marked with
>> 'returnsarg' silently save an argument and return it in codegen without
>> actually representing that return in IR? If so, there's no benefit to taking
>> away the return value because code gen is going to have to keep track of it
>> somehow anyway to return it, and it will have to have special machinery to
>> do so in a way that doesn't waste an extra register unnecessarily (it's not
>> like a normal callee saved register because the value is also going to be
>> used in the body of the function)
>
>
> Yes. I'm suggesting this because the optimization you're proposing really is
> a low-level calling convention optimization that doesn't conceptually belong
> in the mid-level optimizer as more than attributes.

Hmm, well, from what I can gather, what you want is the following:

define i8* @foo(i8* %a) {
; function body...
   ret i8* %a
}

to be canonically represented with the following idiom:

define void @foo(i8* returnedarg %a) {
; function body...
   ret void
}

but to do the same thing.

Since the two are defined to be equivalent, it's obviously possible to
do this transformation, but I don't see what it buys you? The first
representation is actually a more useful one for codegen, because
codegen has to save %a somewhere and return it, as well as collapse
any uses of %a within the body of the function with the saved copy of
%a to prevent wasting registers. So regardless of how it's represented
in the IR, codegen is going to have to treat %a as a local variable
and return it at the end, which is what the first representation shows
more explicitly. Furthermore, given what I'm doing in my latest patch,
the first doesn't inhibit optimization any more than the latter does.

Also, the implementation on ARM treats 'returned' on the first
argument as somewhat like a pseudo calling convention (which a hack
because the current code does not allow a calling convention to save a
register used to pass an argument.) but that's only because the two
happen to be in the same register, but in the general case it's not
that simple. Even on ARM, the implementation is not as simple as
treating R0 as a "saved" register (that happens to also be an
argument), because the variable being saved may be smaller than the
register, in which case the semantics of 'returned' specifies only
that the bits used by that type (rather than the entire argument) are
saved.

So to me, 'returned' is more of a semantic thing rather than a calling
convention thing, it just happens to have a convenient interpretation
in the case that 'returned' attribute is on the first argument, the
calling convention specifies that the same register is used as the
return value, and that argument is word-sized; in all other cases, the
correspondence is not nearly as clean.

>
>>
>>
>> Also, just because the return value is unused at the IR level in this case
>> doesn't meant it won't ever be used. You could canonicalize the IR to remove
>> uses of the return value if you wanted to, but how does a front end express
>> code before it is canonicalized?
>
>
> The front-end would just emit uses of the outgoing argument from the caller.
> WRT memcpy, LLVM's memcpy intrinsic has a void return type already, for
> example, so this is something you'd want to figure out anyway.

That could work for some intrinsics, but the idea would be to infer
this attribute on non-intrinsics as well. If you define that a
function with 'returnedarg' must return void from the IR perspective,
it means you're forcing the inference of 'returnedarg' and the
canonicalization of all callers to replace uses of the return value to
use the outgoing argument in one step, which is possible to do but I
don't know what it really does for you except limiting your
representational flexibility and preventing the optimization from
being split into multiple steps.

I also understand that canonicalizing the IR to use the outgoing
argument is the right thing to do from the mid-level optimizer
perspective, to expose the most mid-level optimization opportunities,
but I don't think it's actually the most convenient representation for
optimal code generation. Instead, my feeling is that a late IR pass
(right before CodeGen) should independently convert every dominated
use to something like "select i1 undef, %struct.A* %a, %struct.A* %b",
tracking the aliases through control flow, and this should be lowered
to some SelectionDAG node having the equivalent semantics (i.e. an
unspecified choice between two alternatives that the register
allocator can choose from to minimize spills.) Basically, you want
code generation to have flexibility to figure out, independently for
each use, whether it's more appropriate to use the outgoing argument
or the incoming return value.

If you remove the possibility of expressing the return value in the
IR, then it's possible for the code generator to figure this all out
directly from your canoncalized form, of course, but (as far as I can
tell) this would take domination and control flow analysis that
normally isn't done by CodeGen; also, I don't see what removing the
possibility of this intermediate IR representation actually buys
you...

Stephen