[PATCH] Teach DeadArgElimination not to eliminate return values of functions with 'returned' arguments

Wed Jun 26 17:00:29 PDT 2013

On Tue, Jun 25, 2013 at 4:10 PM, Stephen Lin <swlin at post.harvard.edu> wrote:

>
>  > Yes. I'm suggesting this because the optimization you're proposing
> really is
> > a low-level calling convention optimization that doesn't conceptually
> belong
> > in the mid-level optimizer as more than attributes.
>
> Hmm, well, from what I can gather, what you want is the following:
>
> define i8* @foo(i8* %a) {
> ; function body...
>    ret i8* %a
> }
>
> to be canonically represented with the following idiom:
>
> define void @foo(i8* returnedarg %a) {
> ; function body...
>    ret void
> }
>
> but to do the same thing.
>
> Since the two are defined to be equivalent, it's obviously possible to
> do this transformation, but I don't see what it buys you? The first
> representation is actually a more useful one for codegen, because
> codegen has to save %a somewhere and return it,

With virtual registers in SSA form, this should be easy to do. Just copy
the value into a virtual register in the entry block, and copy it out in
the exit.

> as well as collapse
> any uses of %a within the body of the function with the saved copy of
> %a to prevent wasting registers.

Many argument registers already have virtual registers, so codegen can
probably just reuse them instead of creating new ones.

> So regardless of how it's represented
> in the IR, codegen is going to have to treat %a as a local variable
> and return it at the end, which is what the first representation shows
> more explicitly. Furthermore, given what I'm doing in my latest patch,
> the first doesn't inhibit optimization any more than the latter does.

> Also, the implementation on ARM treats 'returned' on the first
> argument as somewhat like a pseudo calling convention (which a hack
> because the current code does not allow a calling convention to save a
> register used to pass an argument.) but that's only because the two
> happen to be in the same register, but in the general case it's not
> that simple. Even on ARM, the implementation is not as simple as
> treating R0 as a "saved" register (that happens to also be an
> argument), because the variable being saved may be smaller than the
> register, in which case the semantics of 'returned' specifies only
> that the bits used by that type (rather than the entire argument) are
> saved.
>

The LangRef text for returned says "The parameter and the function return
type must be valid
    operands for the :ref:`bitcast instruction <i_bitcast>`", which implies
that they cna't be smaller. Is it intended that it be generalized to
mismatched sizes?

>
> So to me, 'returned' is more of a semantic thing rather than a calling
> convention thing, it just happens to have a convenient interpretation
> in the case that 'returned' attribute is on the first argument, the
> calling convention specifies that the same register is used as the
> return value, and that argument is word-sized; in all other cases, the
> correspondence is not nearly as clean.
>

If it's a semantic thing, and you're proposing the optimizer be aware of
the semantics of the implicit conversion that happens in machine registers
(more than isTruncFree()?), then we should make that explicit.

>
> >
> >>
> >>
> >> Also, just because the return value is unused at the IR level in this
> case
> >> doesn't meant it won't ever be used. You could canonicalize the IR to
> remove
> >> uses of the return value if you wanted to, but how does a front end
> express
> >> code before it is canonicalized?
> >
> >
> > The front-end would just emit uses of the outgoing argument from the
> caller.
> > WRT memcpy, LLVM's memcpy intrinsic has a void return type already, for
> > example, so this is something you'd want to figure out anyway.
>
> That could work for some intrinsics, but the idea would be to infer
> this attribute on non-intrinsics as well. If you define that a
> function with 'returnedarg' must return void from the IR perspective,
> it means you're forcing the inference of 'returnedarg' and the
> canonicalization of all callers to replace uses of the return value to
> use the outgoing argument in one step, which is possible to do but I
> don't know what it really does for you except limiting your
> representational flexibility and preventing the optimization from
> being split into multiple steps.

>

> I also understand that canonicalizing the IR to use the outgoing
> argument is the right thing to do from the mid-level optimizer
> perspective, to expose the most mid-level optimization opportunities,
> but I don't think it's actually the most convenient representation for
> optimal code generation. Instead, my feeling is that a late IR pass
> (right before CodeGen) should independently convert every dominated
> use to something like "select i1 undef, %struct.A* %a, %struct.A* %b",
> tracking the aliases through control flow, and this should be lowered
> to some SelectionDAG node having the equivalent semantics (i.e. an
> unspecified choice between two alternatives that the register
> allocator can choose from to minimize spills.) Basically, you want
> code generation to have flexibility to figure out, independently for
> each use, whether it's more appropriate to use the outgoing argument
> or the incoming return value.

> If you remove the possibility of expressing the return value in the
> IR, then it's possible for the code generator to figure this all out
> directly from your canoncalized form, of course, but (as far as I can
> tell) this would take domination and control flow analysis that
> normally isn't done by CodeGen; also, I don't see what removing the
> possibility of this intermediate IR representation actually buys
> you...
>

If having uses in the caller use the outgoing return value is best, we
should make that the canonical form :-). It has the advantage of making
hasOneUse() on argument values more aggressive. I don't have specific
reasons why this is beneficial, but it is consistent with the general
principles.

I don't necessary want to remove the possibility of representing other
forms altogether. LLVM already has CodeGenPrepare and a few other passes
which run "late" that are generally permitted to deviate from canonical
form in a variety of ways. I can see something like this making sense.

Or, since Chris doesn't like the situation of those passes, I can see doing
this work in CodeGen itself, but that's a separate discussion.

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130626/42bdc9dc/attachment.html>