[PATCH] Teach DeadArgElimination not to eliminate return values of functions with 'returned' arguments

Wed Jun 26 17:35:11 PDT 2013

> With virtual registers in SSA form, this should be easy to do. Just copy the
> value into a virtual register in the entry block, and copy it out in the
> exit.
>
>>
>> as well as collapse
>> any uses of %a within the body of the function with the saved copy of
>> %a to prevent wasting registers.
>
>
> Many argument registers already have virtual registers, so codegen can
> probably just reuse them instead of creating new ones.

Right, this is all possible, but it's extra complexity to accomplish
exactly what you get for free when you explicitly represent the return
in the IR, so why even bother? :)

>
>>
>> So regardless of how it's represented
>> in the IR, codegen is going to have to treat %a as a local variable
>> and return it at the end, which is what the first representation shows
>> more explicitly. Furthermore, given what I'm doing in my latest patch,
>> the first doesn't inhibit optimization any more than the latter does.
>>
>>
>> Also, the implementation on ARM treats 'returned' on the first
>> argument as somewhat like a pseudo calling convention (which a hack
>> because the current code does not allow a calling convention to save a
>> register used to pass an argument.) but that's only because the two
>> happen to be in the same register, but in the general case it's not
>> that simple. Even on ARM, the implementation is not as simple as
>> treating R0 as a "saved" register (that happens to also be an
>> argument), because the variable being saved may be smaller than the
>> register, in which case the semantics of 'returned' specifies only
>> that the bits used by that type (rather than the entire argument) are
>> saved.
>
>
> The LangRef text for returned says "The parameter and the function return
> type must be valid
>     operands for the :ref:`bitcast instruction <i_bitcast>`", which implies
> that they cna't be smaller. Is it intended that it be generalized to
> mismatched sizes?

No, sorry if that was unclear: it's not meant to be generalized to
mismatched sizes.

The case I mean is the case that the returned argument is smaller than
the register used to pass it through...i.e. a returned 'i16' argument
that is passed via an 'i32' register in the calling convention. In
this case, only the lower 16 bits are preserved by the call, since
that's all the IR concerns itself with. This case is specifically
tested for in a follow-up patch, r180825. (Unfortunately, we don't
currently optimize as well as we could in the case that a returned
'i16' argument is passed through an 'i32' register, but we at least
don't generate invalid code based on the assumption that the full
register is preserved, at least)

The point is that the definition is a semantic definition that doesn't
refer to physical registers at all. As far as the caller is concerned,
anything and everything could happen to the physical registers as long
as the 'returned' contract is fulfilled from the IR perspective.

>
>>
>>
>> So to me, 'returned' is more of a semantic thing rather than a calling
>> convention thing, it just happens to have a convenient interpretation
>> in the case that 'returned' attribute is on the first argument, the
>> calling convention specifies that the same register is used as the
>> return value, and that argument is word-sized; in all other cases, the
>> correspondence is not nearly as clean.
>
>
> If it's a semantic thing, and you're proposing the optimizer be aware of the
> semantics of the implicit conversion that happens in machine registers (more
> than isTruncFree()?), then we should make that explicit.

Sorry, I think I was unclear, it's actually the opposite: the
attribute is purposely defined only in terms of semantics of IR types
specifically so that the IR optimizers don't need to know anything
about machine registers or types.

The fact that it happens to have a convenient interpretation in terms
of registers and machine types in a specific case of a word-sized
integer 'returned' first argument on ARM using the standard C calling
convention is an incidental implementation detail.

>
>>
>>
>> >
>> >>
>> >>
>> >> Also, just because the return value is unused at the IR level in this
>> >> case
>> >> doesn't meant it won't ever be used. You could canonicalize the IR to
>> >> remove
>> >> uses of the return value if you wanted to, but how does a front end
>> >> express
>> >> code before it is canonicalized?
>> >
>> >
>> > The front-end would just emit uses of the outgoing argument from the
>> > caller.
>> > WRT memcpy, LLVM's memcpy intrinsic has a void return type already, for
>> > example, so this is something you'd want to figure out anyway.
>>
>> That could work for some intrinsics, but the idea would be to infer
>> this attribute on non-intrinsics as well. If you define that a
>> function with 'returnedarg' must return void from the IR perspective,
>> it means you're forcing the inference of 'returnedarg' and the
>> canonicalization of all callers to replace uses of the return value to
>> use the outgoing argument in one step, which is possible to do but I
>> don't know what it really does for you except limiting your
>> representational flexibility and preventing the optimization from
>> being split into multiple steps.
>
>>
>>
>> I also understand that canonicalizing the IR to use the outgoing
>> argument is the right thing to do from the mid-level optimizer
>> perspective, to expose the most mid-level optimization opportunities,
>> but I don't think it's actually the most convenient representation for
>> optimal code generation. Instead, my feeling is that a late IR pass
>> (right before CodeGen) should independently convert every dominated
>> use to something like "select i1 undef, %struct.A* %a, %struct.A* %b",
>> tracking the aliases through control flow, and this should be lowered
>> to some SelectionDAG node having the equivalent semantics (i.e. an
>> unspecified choice between two alternatives that the register
>> allocator can choose from to minimize spills.) Basically, you want
>> code generation to have flexibility to figure out, independently for
>> each use, whether it's more appropriate to use the outgoing argument
>> or the incoming return value.
>>
>>
>> If you remove the possibility of expressing the return value in the
>> IR, then it's possible for the code generator to figure this all out
>> directly from your canoncalized form, of course, but (as far as I can
>> tell) this would take domination and control flow analysis that
>> normally isn't done by CodeGen; also, I don't see what removing the
>> possibility of this intermediate IR representation actually buys
>> you...
>
>
> If having uses in the caller use the outgoing return value is best, we
> should make that the canonical form :-). It has the advantage of making
> hasOneUse() on argument values more aggressive. I don't have specific
> reasons why this is beneficial, but it is consistent with the general
> principles.

The problem is that it's not clear until codegen whether the outgoing
or incoming value is the better one to use: it depends on control
flow, what other calls are present, what their calling conventions
are, level of register pressure, etc.

I think it makes sense to canonicalize on one or the other in the IR
for mid-level optimizations, but at some point code gen is going to
have to figure out that there are multiple aliases to choose from in
the SSA virtual register form and choose the right one based on
context. I'm undecided whether any of this analysis should be done at
the IR level or not, but I think it could be (as a pre-lowering IR
pass, after all IR optimizations are done on the canonical form.)

>
> I don't necessary want to remove the possibility of representing other forms
> altogether. LLVM already has CodeGenPrepare and a few other passes which run
> "late" that are generally permitted to deviate from canonical form in a
> variety of ways. I can see something like this making sense.

Right, so what's the purpose of taking away the return value in a
canonical form just to put it back during CodeGenPrepare or something
like that?

Anyway I'm still unclear as to what you are actually proposing and if
you have any objections to the current patch...it seems like you want
everything to work exactly as it does right now, except that the
return value is removed completely from the IR in some canonical form
to make things more parsimonious, but I don't know what this actually
accomplishes other than making for a cosmetic (and misleading, to me)
change to IR :)

Also, regardless of the merits of such a change to IR, it doesn't seem
like it should hold up this patch? I don't really have a problem
changing the IR representation if we really come up with a better one,
but it seems like it would be a separate patch anyway. Please let me
know if you disagree.

Thanks :)
Stephen