[PATCH] Teach DeadArgElimination not to eliminate return values of functions with 'returned' arguments

Wed Jun 26 14:20:05 PDT 2013

> Well, yes. I'm not here to veto the patch, but I am going to take you up on
> your offer to raise an objection. :) I've been holding off on commenting
> further because I don't yet have the constructive suggestions that ought to
> follow my objection, and I think I might if I thought about it longer.
>
> Let's explore the things we could do with the 'returned' attribute. If we
> have internal-linkage functions that return void, we could change them to
> return one of their arguments and mark it 'returned'. When is this
> profitable? You could even have some heuristics to decide which one of the
> arguments is most likely to be useful to the caller, and put 'returned' on
> that one.
>
> There's no reason that we need to limit ourselves to a single register. Keep
> the void return type, and let the x86 calling convention have a rule where
> we use AX, BX, CX, DX in that order for up to four 'returned' arguments.
> When is it profitable? You might not want to use all four if the caller has
> higher register pressure. It depends on the register pressure in the caller,
> and register allocation particulars in the callee.
>
> The 'returned' attribute is interprocedural register allocation done at the
> IR level, encoded with parameter attributes.

Yes, I agree that part of this optimization does amount to a limited
form of interprocedural register-allocation, and a more generic form
of this optimization can be done at some point with internalized
functions that does not have the constraint of having a single return
value or even that the 'preserved' argument is represented in IR at
all. However, I think the common case of having a single value being
returned happens often enough (from the ABI, the C API definition, or
from user code observed to have this property, in any language) that
is useful to treat the case specially. Furthermore, in the latter two
cases, the optimization has an IR-level semantic component as well,
since return value may not actually be unused by the calling function.
Admittedly, fully handling the semantics in mid-level optimizers might
be as simple as a RAUW, but it doesn't mean that the semantics do not
exist.

>
> This is a major architectural change to llvm. I don't think anybody realized
> the implications of the 'returned' patch when it was under review.
>
> We don't have the necessary framework to decide when or where 'returned'
> belongs. We can't know whether it will require expensive extra copies in the
> callee or if the caller will be under high register pressure, not at the IR
> level. Heuristics would be an approximation of codegen, with a tradeoff in
> complexity vs. accuracy  equal to how much of codegen we want to
> reimplement.
>
> I do understand how it solves your problem in the cross-TU case, where you
> have this great ABI guarantee that the argument gets returned, which you can
> use in the register allocator. It isn't even a calling-convention on its
> own, it's more of a modifier that can be applied to any calling convention.
> I can see how it ended up as a parameter attribute, much like zeroext is a
> ABI-level parameter attribute. But once we start to apply its semantics in
> the same-TU case, it becomes an llvm IR codified hack to work around the
> lack of interprocedural register allocation. That's not okay.

Yes, in the future we really ought to take advantage of opportunities
like this in more broadly in the fully-internalized case at the
codegen level by doing interprocedural register allocation with
fastcc, and I don't think anyone is suggesting that we expand IR
attributes further to handle all possible variations of inter
procedural register allocation at the IR level, since that kind of
information obviously does not belong there.

However, I don't understand why the fact that the fully general, fully
internalized form of this optimization shouldn't be represented at the
IR level precludes representing this more limited form of the
optimization there, especially when the limited form is applicable
across TUs, to functions that are not internalized, and also has a
semantic component.

>
> Anyhow, as I said I'm not going to block this patch because I don't have
> ideas on what would be a better way to do it, but please get an OK from Dan
> first.
>
> Nick