[PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Wed Jan 14 00:44:04 PST 2015

Thanks, Reid!

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:128-130
@@ +127,5 @@
+  // For now, enable it only for the relatively clear win cases.
+  bool ExpectReservedFrame = MF.getFrameInfo()->hasVarSizedObjects();
+  if (ExpectReservedFrame)
+    return true;
+
----------------
rnk wrote:
> I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we *can't* reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense?
Err, yes, you're right, sorry about that... got distracted while naming the variable, I guess, I meant the opposite.
Thanks!

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:143
@@ +142,3 @@
+
+  // Stack re-alignment can make this unprofitable even in terms of size.
+  // As mentioned above, a better heuristic is needed. For now, don't do this
----------------
rnk wrote:
> mkuper wrote:
> > rnk wrote:
> > > Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment?
> > If we get here, we're in opt-for-size + stack-realignment land.
> > 
> > And, yes, that's exactly what it is is about.
> > If you are passing only one parameter, the original code would be:
> > 
> > ```
> > mov %eax, 128(%esp)
> > call $foo
> > ```
> > 
> > Without re-alignment, you have
> > ```
> > push %eax
> > call $foo
> > add $4, %esp
> > ```
> > which is still a win in terms of code-size
> > 
> > With re-alignment, you get:
> > ```
> > sub $16, %esp
> > push %eax
> > call $foo
> > add $12, %esp
> > ```
> > Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it.
> > 
> > Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.)
> Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like:
> ```
> sub $12, %esp
> push %eax
> call $foo
> add $16, %esp
> ```
> I can see why this is less profitable.
Yes, that sequence. :-)

It doesn't depend on dynamic allocas, though. 
If you don't have a reserved frame (for whatever reason - for x86 after this patch, it's either dynamic allocas, or because we forced it not to reserve by using pushes), then you need this re-alignment.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:208
@@ +207,3 @@
+  // no gaps between them.
+  std::map<int64_t, MachineBasicBlock::iterator> MovMap;
+
----------------
rnk wrote:
> std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4.
That can work. Thanks, I'll try.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:220-222
@@ +219,5 @@
+    // but rather a frame index.
+    // TODO: Support the fi case. This should probably work now that we
+    // have the infrastructure to track the stack pointer within a call
+    // sequence.
+    if (!I->getOperand(X86::AddrBaseReg).isReg() ||
----------------
rnk wrote:
> This seems worth tackling, given that you had to handle the `call <fi>` case. :)
Yes, definitely. :-)
It may even work out of the box now. But I think I still want to split it into a separate commit.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:364-368
@@ +363,7 @@
+
+  // Make sure the def is a MOV from memory.
+  // If the def is an another block, give up.
+  if (DefMI->getOpcode() != X86::MOV32rm ||
+      DefMI->getParent() != FrameSetup->getParent())
+    return nullptr;
+
----------------
rnk wrote:
> It's not clear to me that same BB is sufficient, consider this potential BB:
> ```
> movl (%edi), %eax
> movl $42, (%edi)
> <call setup>
> movl %eax, (%esp)
> calll foo
> <call end>
> ```
> We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that.
Right now I'm way more conservative than even that - I'm checking below that everything between this mov and the call setup is a MOV32rm. The "same basic block" check here is just a way to short-circuit the obviously wrong cases.

This catches some common cases like the one in the comment above, but of course misses other opportunities.
I could check for a mayStore() instead, but I'm not sure that's safe enough. I'd like to relax the condition - but again, I think that ought to be a separate commit.

================
Comment at: lib/Target/X86/X86InstrInfo.cpp:1717-1718
@@ +1716,4 @@
+  // pseudo. This leaves us with a choice:
+  // 1) Look for the next ADJCALLSTACKUP, like the code below. This
+  // works, but certainly doesn't belong here.
+  // 2) Factor out the code that makes the decision on whether a function
----------------
rnk wrote:
> This is the best thing I can think of at the moment. =/
Too bad. :-\ 

So you think I should commit with this code as is?
This shouldn't be a huge problem in terms of compile-time (since I'm looking only until the next call, it can't go quadratic), but it's insanely ugly.

http://reviews.llvm.org/D6789

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/