[PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Tue Jan 27 17:17:24 PST 2015

I mostly have concerns about the robustness of this approach. For example,
it's really hard to implement getSPAdjust correctly. Right now it assumes
that call sequences contain a restricted set of MI, and it's not clear to
me that that assumption always holds.

If we start from pushes, it should be really easy to use local or global
heuristics to transform back to a reserved frame. I'm willing to give up on
the multi-bb call sequence case caused by cmov lowering. If you all call
sequences are in a single MBB, it should be really easy to insert the right
adjustments.

I also like the idea of having the push nodes available in the DAG so we
can get better instruction selection. Maybe the combines that you wrote out
on MI are enough for most code, but to me it just seems like we're
reimplementing some of isel.

Anyway, I don't really want to block this over this disagreement. I'm not
the one doing the implementation. I certainly don't want to spend a week
figuring out how to write tablegen patters for PUSH. The store to push
conversion pass is not that complicated, and we can go back and reevaluate
in the future.

On Sun, Jan 25, 2015 at 1:21 AM, Kuperstein, Michael M <
michael.m.kuperstein at intel.com> wrote:

>  Hi Chandler,
>
>
>
> This is something that Reid and I talked about on IRC, but I don’t think
> we came to a conclusion both of us were happy with (hence Reid’s “lgtm with
> reservations”, I guess :-) )
>
>
>
> First, I don’t think the decision on whether to use movs or pushes belongs
> in the DAG.
>
> The decision on whether a call-site should use movs or pushes needs to be
> aware of its context, because having even one call-site use pushes means we
> will not have a reserved call frame, which affects the way all other call
> sites are treated as well. This patch makes the decision based on global
> attributes only (opt for size vs. speed, stack alignment), but the next
> step will be to make it based on an analysis of the call-sites – e.g. even
> with stack alignment of 16, it can still often be a win, depending on just
> how many of the function calls we can actually transform, and how many
> memory arguments each call has.
>
>
>
> So the way I envision the next step is that the pass will:
>
> a)      Collect the necessary information from all call sites in the
> function.
>
> b)      Make a judgment on whether the transformation is worth it – in
> terms of size for Os/Oz, in terms of performance for other opt levels.
>
> c)       Perform the transformation.
>
> I don’t see how we can do this on the DAG.
>
>
>
> If I understand Reid’s last suggestion, he proposed to flip the default –
> that is, emit pushes in the DAG, and have an MI pass that does the opposite
> (push -> mov) transformation if necessary.
>
>
>
> I don’t believe that removes a lot of complexity or would improve
> performance.
>
> The code in PEI, InstrInfo and FrameLowering is just a side effect on not
> being able to rely on a 0 SPAdj in PEI anymore (that is,
> canSimplifyCallFramePseudos() can now return false), and is needed
> regardless of how the transformation is performed. And we will still need
> the heuristic decision.
>
> Some of the logic in looking for sequences where the conversion is
> possible will disappear, but I think a lot of it will remain as conditions
> on the incoming operand DAG nodes. And since we don’t want to transform
> each push into a “adjust esp, mov” but rather want to group all the esp
> adjustments back into the ADJCALLSTACKs, we will still need to have code in
> the pass that make sure this is safe w.r.t to the final sequence.
>
> The main benefit I see is that we will no longer need to have the folding
> code – rather, we will have to unfold PUSH32rmm, which is simpler. However,
> I hope I can eventually get rid of the folding here by teaching
> PeepholeOptimizer to be smarter about this.
>
>
>
> On the other hand, X86TargetLowering::LowerCall() is already, IMHO, a
> fairly complex piece of code, and I’d rather avoid making it even more
> complex.
>
> Conceptually, I’d prefer that LowerCall() did standard mov-based lowering
> in all cases like it does now (we aren’t always going to lower to pushes
> anyway – it doesn’t really make sense for x864-64) and treat pushes as an
> optimization where available.
>
>
>
> What do you think?
>
>
>
> Michael
>
>
>
> *From:* Chandler Carruth [mailto:chandlerc at google.com]
> *Sent:* Thursday, January 22, 2015 23:07
> *To:* reviews+D6789+public+a4ec4af5a5133e84 at reviews.llvm.org
> *Cc:* Kuperstein, Michael M; Nadav Rotem; Demikhovsky, Elena; Commit
> Messages and Patches for LLVM
> *Subject:* Re: [PATCH] [X86] Convert esp-relative movs of function
> arguments to pushes, step 2
>
>
>
>
>
> On Thu, Jan 22, 2015 at 12:57 PM, Reid Kleckner <rnk at google.com> wrote:
>
> lgtm
>
> I still think forming pushes prior to isel is the way to go long term.
> It's a lot easier to convert pushes to 'load, SP adjust, store' than it is
> to go the other way.
>
>
> If that is the right way to go, then please don't add this complexity and
> compile time hit. It doesn't make sense to add code, maintain that code,
> fix bugs in it, and deal with the compile time when it wouldn't be needed
> if we took the other approach.
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150127/1529b950/attachment.html>