[PATCH] Teach DeadArgElimination not to eliminate return values of functions with 'returned' arguments

Wed Jun 26 14:57:58 PDT 2013

On 26 June 2013 12:58, Evan Cheng <evan.cheng at apple.com> wrote:

>
> On Jun 26, 2013, at 12:41 AM, Nick Lewycky <nicholas at mxc.ca> wrote:
>
> > Stephen Lin wrote:
> >> On Mon, Jun 24, 2013 at 2:38 PM, Stephen Lin<swlin at post.harvard.edu>
>  wrote:
> >>> On Mon, Jun 24, 2013 at 11:23 AM, Stephen Lin<swlin at post.harvard.edu>
>  wrote:
> >>>>> No, I don't have any alternatives, other than interprocedural
> coordination
> >>>>> of code generation. Do you have any alternatives?
> >>>>
> >>>> The only thing I can think of is a few more heuristics to avoid
> >>>> keeping the return value in cases where it is "obviously" not useful
> >>>> to the caller.
> >>>
> >>> After some thought, my feeling is that some other pass (or passes),
> >>> run before DeadArgumentElimination, should:
> >>>
> >>> 1. Infer 'returned' where it is implied by the callee's IR and looks
> >>> like it could be useful in the caller(s) (based on some
> >>> target-independent heuristics, which might not be 100% accurate but
> >>> can make a reasonable guess)
> >>> 2. Remove 'returned' if it's present but obviously not useful to any
> >>> caller (note that this itself won't eliminate the return value, only
> >>> the argument's 'returned' attribute, so wouldn't require the function
> >>> to be internalized.)
> >>> 3. Canonicalize uses of return values and uses of 'returned' arguments
> >>> dominated by the call (in some manner TBD)
> >>>
> >>> In other words, 'returned' should be treated as a hint that should
> >>> only be added by a pass if it has positive reason to believe that it
> >>> will be useful, and can be removed if it looks like it will not be,
> >>> rather than just being automatically applied automatically wherever
> >>> the semantics hold. I think this is the best compromise approach that
> >>> will allow the attribute to be taken advantage of without forcing
> >>> inter-procedural coordination and complicated control flow analysis at
> >>> code gen.
> >>>
> >>> Given that the 'returned' attribute will, for the near future, only
> >>> generated by clang in a specific place, this doesn't seem necessary
> >>> yet, but should be done eventually. In the meantime, I think the
> >>> current patch is the best tactical approach to at least get the
> >>> attribute working for ARM C++ ABI 'this'-returns (in particular,
> >>> allowing elision of 'this' save/restores across constructor or
> >>> destructor calls, which the current front-end only implementation does
> >>> not do) without breaking.
> >>
> >> Anyone still have a significant objection to the patch given the
> >> above? I would like to get this in, if not...
> >
> > Well, yes. I'm not here to veto the patch, but I am going to take you up
> on your offer to raise an objection. :) I've been holding off on commenting
> further because I don't yet have the constructive suggestions that ought to
> follow my objection, and I think I might if I thought about it longer.
> >
> > Let's explore the things we could do with the 'returned' attribute. If
> we have internal-linkage functions that return void, we could change them
> to return one of their arguments and mark it 'returned'. When is this
> profitable? You could even have some heuristics to decide which one of the
> arguments is most likely to be useful to the caller, and put 'returned' on
> that one.
> >
> > There's no reason that we need to limit ourselves to a single register.
> Keep the void return type, and let the x86 calling convention have a rule
> where we use AX, BX, CX, DX in that order for up to four 'returned'
> arguments. When is it profitable? You might not want to use all four if the
> caller has higher register pressure. It depends on the register pressure in
> the caller, and register allocation particulars in the callee.
> >
> > The 'returned' attribute is interprocedural register allocation done at
> the IR level, encoded with parameter attributes.
> >
> > This is a major architectural change to llvm. I don't think anybody
> realized the implications of the 'returned' patch when it was under review.
>
> FYI, this was discussed inside the Apple team and Chris has approved this
> approach.

I appreciate that you did due diligence when designing it before sending it
out. However, doing that design behind closed doors makes it unclear which
objections have been raised or which concerns have been considered. There's
a difference between "we saw the problems but decided this was the best
tradeoff" and "we didn't see this problem". Indeed, your reply still hasn't
told me which it was.

Also, I'm not sure why you consider this a "major architectural change".
>

Suppose I create a new parameter attribute 'AX' which means that this
parameter will be present in register 'AX' upon function exit. What's the
difference between this and attribute 'returned'? Clearly 'AX' is a
reference to the x86 register set, while 'returned' means AX on some x86
calling conventions and other registers on other CCs or targets. But of
course we could implement my attribute 'AX' differently on different
targets. Is there any difference?

Labelling llvm values with registers is register allocation. Doing register
allocation on llvm IR, with llvm IR as output, is something I would
consider a major architectural change.

Making the necessary information available at IR time to let IR passes do
register allocation *well* would be a major effort, but not a change to the
architecture. At that point, it would just be incremental improvement.

I agree there are opportunities to further expand this optimization. But
> I'm not sure that's the relevant discussion here. The question is whether
> the approach for the most common case (which is important for C++ and x86,
> arm calling conventions).
>
> >
> > We don't have the necessary framework to decide when or where 'returned'
> belongs. We can't know whether it will require expensive extra copies in
> the callee or if the caller will be under high register pressure, not at
> the IR level. Heuristics would be an approximation of codegen, with a
> tradeoff in complexity vs. accuracy  equal to how much of codegen we want
> to reimplement.
> >
> > I do understand how it solves your problem in the cross-TU case, where
> you have this great ABI guarantee that the argument gets returned, which
> you can use in the register allocator. It isn't even a calling-convention
> on its own, it's more of a modifier that can be applied to any calling
> convention. I can see how it ended up as a parameter attribute, much like
> zeroext is a ABI-level parameter attribute. But once we start to apply its
> semantics in the same-TU case, it becomes an llvm IR codified hack to work
> around the lack of interprocedural register allocation. That's not okay.
>
> We shouldn't require knowledge of the target and the calling conventions
> to perform optimization at the IR level. I believe the important thing here
> is for DAE to be consistent. It should either keep the argument and the
> "returned" attribute or remove the argument as well as the attribute. The
> rest of the work should be left to codegen. In today's implementations, the
> frontend is responsible for some of the ABI lowering. It has detailed
> knowledge about target calling conventions. It's a reasonable and obvious
> place to decide on where to place the "returned" attribute.
>

Yep, I agree with the meat of this. In another part of this thread Dan is
examining the implications of how the returned attribute and DAE interact,
I hear there's a subtlety involving tail calls? But I absolutely agree the
frontend is responsible (and a sensible place) for doing ABI lowering, that
it's a good place to decide where 'returned' goes, that we don't want the
IR to know things about targets or calling conventions, etc.

>
> > Anyhow, as I said I'm not going to block this patch because I don't have
> ideas on what would be a better way to do it, but please get an OK from Dan
> first.
>
> This is a long thread so I didn't read all of it. If I understand
> correctly, Dan's proposal is to change the function prototype to return
> void and recognize the optimization opportunity at codegen. At the codegen
> level, the handling of function arguments and return values are scattered
> all over the place. It's going to add more complexity and I really do not
> see how it would make the optimization more powerful.
>

I'm not concerned about how powerful the optimization is. Nobody has
offered any performance numbers for this optimization yet. What I'm
concerned about is the semantics of the attribute, and the impact on the IR
transforms.

One thing I'm considering is whether we've made 'returned' too powerful by
making it too general. Again, I'm still stuck on constructive suggestions.
I wish I could say "here's how to do it properly", but I haven't got that.

Having Stephen's patch go in is actually fine with me, as long as we
understand that we can't allow 'returned' to become a sanctioned place for
doing register allocation on the IR (or conversely, that we form a
consensus that doing IPA-RA on the IR is the right thing to do -- though I
think it isn't). Ideally, I'd like 'returned' to be defined in such a way
that it isn't possible to misuse like that. That's my goal with this thread.

Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130626/c399fa82/attachment.html>