[PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Tue Jan 13 09:04:33 PST 2015

I haven't finished reviewing yet, but I've got to run and handle something personal.

At a high level, is there any reason we shouldn't commit to push/pop earlier to allow for better ISel, rather than trying to transform call sequences later? Specifically, I'm thinking about adding an X86ISD::PUSH DAG node and changing X86TargetLowering::LowerCall() to use it.

================
Comment at: lib/CodeGen/PrologEpilogInserter.cpp:855-856
@@ +854,4 @@
+    // Note that this must come after eliminateFrameIndex, because 
+    // if I itself referred to a frame index, we shouldn't count its own
+    // adjustment.
+    if (MI && InsideCallSequence)
----------------
This seems like an x86-specific quirk, right? Given "push [esp + 8]", x86 chips will load [esp + 8] before adjusting esp, and I think this code motion accomplishes that.

I'm OK with that motion so long as there are no other upstream LLVM backends with CISC-y instructions like "push [SP-mem]". :)

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:11
@@ +10,3 @@
+// This file defines a pass that optimizes call sequences on x86.
+// Currently, it converts movs of function parameters onto the stck into 
+// pushes. This is beneficial for two main reasons:
----------------
s/stck/stack/

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:82-83
@@ +81,4 @@
+
+  // We currently only support call sequences where *all* parameters.
+  // are passed on the stack.
+  // No point in running this in 64-bit mode, since some arguments are
----------------
I think it's important to at least support __thiscall eventually, since that's a very common convention with one regparm.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:84-86
@@ +83,5 @@
+  // are passed on the stack.
+  // No point in running this in 64-bit mode, since some arguments are
+  // passed in-register in all common calling conventions, so the pattern
+  // we're looking for will never match.
+  const X86Subtarget &STI = MF.getTarget().getSubtarget<X86Subtarget>();
----------------
I guess I would justify this more in terms of reducing the extra CFI that we would have to emit to describe the SP adjustments. Converting a few movs to pushes isn't worth the complexity.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:143
@@ +142,3 @@
+
+  // Stack re-alignment can make this unprofitable even in terms of size.
+  // As mentioned above, a better heuristic is needed. For now, don't do this
----------------
Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment?

http://reviews.llvm.org/D6789

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/