<div dir="ltr">On Tue, Jun 25, 2013 at 4:10 PM, Stephen Lin <span dir="ltr"><<a href="mailto:swlin@post.harvard.edu" target="_blank">swlin@post.harvard.edu</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><br></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div><div>

> Yes. I'm suggesting this because the optimization you're proposing really is<br>

> a low-level calling convention optimization that doesn't conceptually belong<br>

> in the mid-level optimizer as more than attributes.<br>

<br>

</div></div>Hmm, well, from what I can gather, what you want is the following:<br>

<br>

define i8* @foo(i8* %a) {<br>

; function body...<br>

   ret i8* %a<br>

}<br>

<br>

to be canonically represented with the following idiom:<br>

<br>

define void @foo(i8* returnedarg %a) {<br>

; function body...<br>

   ret void<br>

}<br>

<br>

but to do the same thing.<br>

<br>

Since the two are defined to be equivalent, it's obviously possible to<br>

do this transformation, but I don't see what it buys you? The first<br>

representation is actually a more useful one for codegen, because<br>

codegen has to save %a somewhere and return it, </blockquote><div><br></div><div>With virtual registers in SSA form, this should be easy to do. Just copy the value into a virtual register in the entry block, and copy it out in the exit.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">as well as collapse<br>

any uses of %a within the body of the function with the saved copy of<br>

%a to prevent wasting registers. </blockquote><div><br></div><div>Many argument registers already have virtual registers, so codegen can probably just reuse them instead of creating new ones.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


So regardless of how it's represented<br>

in the IR, codegen is going to have to treat %a as a local variable<br>

and return it at the end, which is what the first representation shows<br>

more explicitly. Furthermore, given what I'm doing in my latest patch,<br>

the first doesn't inhibit optimization any more than the latter does.</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<br>

Also, the implementation on ARM treats 'returned' on the first<br>

argument as somewhat like a pseudo calling convention (which a hack<br>

because the current code does not allow a calling convention to save a<br>

register used to pass an argument.) but that's only because the two<br>

happen to be in the same register, but in the general case it's not<br>

that simple. Even on ARM, the implementation is not as simple as<br>

treating R0 as a "saved" register (that happens to also be an<br>

argument), because the variable being saved may be smaller than the<br>

register, in which case the semantics of 'returned' specifies only<br>

that the bits used by that type (rather than the entire argument) are<br>

saved.<br></blockquote><div><br></div><div>The LangRef text for returned says "The parameter and the function return type must be valid</div><div>    operands for the :ref:`bitcast instruction <i_bitcast>`", which implies that they cna't be smaller. Is it intended that it be generalized to mismatched sizes?</div>

<div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

So to me, 'returned' is more of a semantic thing rather than a calling<br>

convention thing, it just happens to have a convenient interpretation<br>

in the case that 'returned' attribute is on the first argument, the<br>

calling convention specifies that the same register is used as the<br>

return value, and that argument is word-sized; in all other cases, the<br>

correspondence is not nearly as clean.<br></blockquote><div><br></div><div>If it's a semantic thing, and you're proposing the optimizer be aware of the semantics of the implicit conversion that happens in machine registers (more than isTruncFree()?), then we should make that explicit.</div>


<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div><br>

><br>

>><br>

>><br>

>> Also, just because the return value is unused at the IR level in this case<br>

>> doesn't meant it won't ever be used. You could canonicalize the IR to remove<br>

>> uses of the return value if you wanted to, but how does a front end express<br>

>> code before it is canonicalized?<br>

><br>

><br>

> The front-end would just emit uses of the outgoing argument from the caller.<br>

> WRT memcpy, LLVM's memcpy intrinsic has a void return type already, for<br>

> example, so this is something you'd want to figure out anyway.<br>

<br>

</div>That could work for some intrinsics, but the idea would be to infer<br>

this attribute on non-intrinsics as well. If you define that a<br>

function with 'returnedarg' must return void from the IR perspective,<br>

it means you're forcing the inference of 'returnedarg' and the<br>

canonicalization of all callers to replace uses of the return value to<br>

use the outgoing argument in one step, which is possible to do but I<br>

don't know what it really does for you except limiting your<br>

representational flexibility and preventing the optimization from<br>

being split into multiple steps.</blockquote><div>> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


I also understand that canonicalizing the IR to use the outgoing<br>

argument is the right thing to do from the mid-level optimizer<br>

perspective, to expose the most mid-level optimization opportunities,<br>

but I don't think it's actually the most convenient representation for<br>

optimal code generation. Instead, my feeling is that a late IR pass<br>

(right before CodeGen) should independently convert every dominated<br>

use to something like "select i1 undef, %struct.A* %a, %struct.A* %b",<br>

tracking the aliases through control flow, and this should be lowered<br>

to some SelectionDAG node having the equivalent semantics (i.e. an<br>

unspecified choice between two alternatives that the register<br>

allocator can choose from to minimize spills.) Basically, you want<br>

code generation to have flexibility to figure out, independently for<br>

each use, whether it's more appropriate to use the outgoing argument<br>

or the incoming return value.</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

If you remove the possibility of expressing the return value in the<br>

IR, then it's possible for the code generator to figure this all out<br>

directly from your canoncalized form, of course, but (as far as I can<br>

tell) this would take domination and control flow analysis that<br>

normally isn't done by CodeGen; also, I don't see what removing the<br>

possibility of this intermediate IR representation actually buys<br>

you...<br></blockquote><div><br></div><div>If having uses in the caller use the outgoing return value is best, we should make that the canonical form :-). It has the advantage of making hasOneUse() on argument values more aggressive. I don't have specific reasons why this is beneficial, but it is consistent with the general principles.</div>

<div><br></div><div>I don't necessary want to remove the possibility of representing other forms altogether. LLVM already has CodeGenPrepare and a few other passes which run "late" that are generally permitted to deviate from canonical form in a variety of ways. I can see something like this making sense.</div>

<div><br></div><div>Or, since Chris doesn't like the situation of those passes, I can see doing this work in CodeGen itself, but that's a separate discussion.</div><div><br></div><div>Dan</div><div><br></div></div>

</div></div>