<div dir="ltr">On 26 June 2013 12:58, Evan Cheng <span dir="ltr"><<a href="mailto:evan.cheng@apple.com" target="_blank">evan.cheng@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div class=""><div class="h5"><br>

On Jun 26, 2013, at 12:41 AM, Nick Lewycky <<a href="mailto:nicholas@mxc.ca">nicholas@mxc.ca</a>> wrote:<br>

<br>

> Stephen Lin wrote:<br>

>> On Mon, Jun 24, 2013 at 2:38 PM, Stephen Lin<<a href="mailto:swlin@post.harvard.edu">swlin@post.harvard.edu</a>>  wrote:<br>

>>> On Mon, Jun 24, 2013 at 11:23 AM, Stephen Lin<<a href="mailto:swlin@post.harvard.edu">swlin@post.harvard.edu</a>>  wrote:<br>

>>>>> No, I don't have any alternatives, other than interprocedural coordination<br>

>>>>> of code generation. Do you have any alternatives?<br>

>>>><br>

>>>> The only thing I can think of is a few more heuristics to avoid<br>

>>>> keeping the return value in cases where it is "obviously" not useful<br>

>>>> to the caller.<br>

>>><br>

>>> After some thought, my feeling is that some other pass (or passes),<br>

>>> run before DeadArgumentElimination, should:<br>

>>><br>

>>> 1. Infer 'returned' where it is implied by the callee's IR and looks<br>

>>> like it could be useful in the caller(s) (based on some<br>

>>> target-independent heuristics, which might not be 100% accurate but<br>

>>> can make a reasonable guess)<br>

>>> 2. Remove 'returned' if it's present but obviously not useful to any<br>

>>> caller (note that this itself won't eliminate the return value, only<br>

>>> the argument's 'returned' attribute, so wouldn't require the function<br>

>>> to be internalized.)<br>

>>> 3. Canonicalize uses of return values and uses of 'returned' arguments<br>

>>> dominated by the call (in some manner TBD)<br>

>>><br>

>>> In other words, 'returned' should be treated as a hint that should<br>

>>> only be added by a pass if it has positive reason to believe that it<br>

>>> will be useful, and can be removed if it looks like it will not be,<br>

>>> rather than just being automatically applied automatically wherever<br>

>>> the semantics hold. I think this is the best compromise approach that<br>

>>> will allow the attribute to be taken advantage of without forcing<br>

>>> inter-procedural coordination and complicated control flow analysis at<br>

>>> code gen.<br>

>>><br>

>>> Given that the 'returned' attribute will, for the near future, only<br>

>>> generated by clang in a specific place, this doesn't seem necessary<br>

>>> yet, but should be done eventually. In the meantime, I think the<br>

>>> current patch is the best tactical approach to at least get the<br>

>>> attribute working for ARM C++ ABI 'this'-returns (in particular,<br>

>>> allowing elision of 'this' save/restores across constructor or<br>

>>> destructor calls, which the current front-end only implementation does<br>

>>> not do) without breaking.<br>

>><br>

>> Anyone still have a significant objection to the patch given the<br>

>> above? I would like to get this in, if not...<br>

><br>

> Well, yes. I'm not here to veto the patch, but I am going to take you up on your offer to raise an objection. :) I've been holding off on commenting further because I don't yet have the constructive suggestions that ought to follow my objection, and I think I might if I thought about it longer.<br>


><br>

> Let's explore the things we could do with the 'returned' attribute. If we have internal-linkage functions that return void, we could change them to return one of their arguments and mark it 'returned'. When is this profitable? You could even have some heuristics to decide which one of the arguments is most likely to be useful to the caller, and put 'returned' on that one.<br>


><br>

> There's no reason that we need to limit ourselves to a single register. Keep the void return type, and let the x86 calling convention have a rule where we use AX, BX, CX, DX in that order for up to four 'returned' arguments. When is it profitable? You might not want to use all four if the caller has higher register pressure. It depends on the register pressure in the caller, and register allocation particulars in the callee.<br>


><br>

> The 'returned' attribute is interprocedural register allocation done at the IR level, encoded with parameter attributes.<br>

><br>

> This is a major architectural change to llvm. I don't think anybody realized the implications of the 'returned' patch when it was under review.<br>

<br>

</div></div>FYI, this was discussed inside the Apple team and Chris has approved this approach.</blockquote><div><br></div><div><div>I appreciate that you did due diligence when designing it before sending it out. However, doing that design behind closed doors makes it unclear which objections have been raised or which concerns have been considered. There's a difference between "we saw the problems but decided this was the best tradeoff" and "we didn't see this problem". Indeed, your reply still hasn't told me which it was.</div>


</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> Also, I'm not sure why you consider this a "major architectural change".<br>


</blockquote><div><br></div><div>Suppose I create a new parameter attribute 'AX' which means that this parameter will be present in register 'AX' upon function exit. What's the difference between this and attribute 'returned'? Clearly 'AX' is a reference to the x86 register set, while 'returned' means AX on some x86 calling conventions and other registers on other CCs or targets. But of course we could implement my attribute 'AX' differently on different targets. Is there any difference?<br>


</div><div><br></div><div>Labelling llvm values with registers is register allocation. Doing register allocation on llvm IR, with llvm IR as output, is something I would consider a major architectural change.</div><div><br>


</div><div>Making the necessary information available at IR time to let IR passes do register allocation *well* would be a major effort, but not a change to the architecture. At that point, it would just be incremental improvement.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I agree there are opportunities to further expand this optimization. But I'm not sure that's the relevant discussion here. The question is whether the approach for the most common case (which is important for C++ and x86, arm calling conventions).<br>


<div class="im"><br>

><br>

> We don't have the necessary framework to decide when or where 'returned' belongs. We can't know whether it will require expensive extra copies in the callee or if the caller will be under high register pressure, not at the IR level. Heuristics would be an approximation of codegen, with a tradeoff in complexity vs. accuracy  equal to how much of codegen we want to reimplement.<br>


><br>

> I do understand how it solves your problem in the cross-TU case, where you have this great ABI guarantee that the argument gets returned, which you can use in the register allocator. It isn't even a calling-convention on its own, it's more of a modifier that can be applied to any calling convention. I can see how it ended up as a parameter attribute, much like zeroext is a ABI-level parameter attribute. But once we start to apply its semantics in the same-TU case, it becomes an llvm IR codified hack to work around the lack of interprocedural register allocation. That's not okay.<br>


<br>

</div>We shouldn't require knowledge of the target and the calling conventions to perform optimization at the IR level. I believe the important thing here is for DAE to be consistent. It should either keep the argument and the "returned" attribute or remove the argument as well as the attribute. The rest of the work should be left to codegen. In today's implementations, the frontend is responsible for some of the ABI lowering. It has detailed knowledge about target calling conventions. It's a reasonable and obvious place to decide on where to place the "returned" attribute.<br>


</blockquote><div><br></div><div>Yep, I agree with the meat of this. In another part of this thread Dan is examining the implications of how the returned attribute and DAE interact, I hear there's a subtlety involving tail calls? But I absolutely agree the frontend is responsible (and a sensible place) for doing ABI lowering, that it's a good place to decide where 'returned' goes, that we don't want the IR to know things about targets or calling conventions, etc.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class="im">><br>

> Anyhow, as I said I'm not going to block this patch because I don't have ideas on what would be a better way to do it, but please get an OK from Dan first.<br>

<br>

</div>This is a long thread so I didn't read all of it. If I understand correctly, Dan's proposal is to change the function prototype to return void and recognize the optimization opportunity at codegen. At the codegen level, the handling of function arguments and return values are scattered all over the place. It's going to add more complexity and I really do not see how it would make the optimization more powerful.<br>


</blockquote><div><br></div><div>I'm not concerned about how powerful the optimization is. Nobody has offered any performance numbers for this optimization yet. What I'm concerned about is the semantics of the attribute, and the impact on the IR transforms.</div>


<div><br></div><div>One thing I'm considering is whether we've made 'returned' too powerful by making it too general. Again, I'm still stuck on constructive suggestions. I wish I could say "here's how to do it properly", but I haven't got that.</div>


<div><br></div><div><div>Having Stephen's patch go in is actually fine with me, as long as we understand that we can't allow 'returned' to become a sanctioned place for doing register allocation on the IR (or conversely, that we form a consensus that doing IPA-RA on the IR is the right thing to do -- though I think it isn't). Ideally, I'd like 'returned' to be defined in such a way that it isn't possible to misuse like that. That's my goal with this thread.</div>


</div><div><br></div><div>Nick</div><div><br></div></div></div></div>