[PATCH] [X86] Convert esp-relative movs of function arguments to pushes, step 2

Tue Jan 13 15:58:52 PST 2015

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:128-130
@@ +127,5 @@
+  // For now, enable it only for the relatively clear win cases.
+  bool ExpectReservedFrame = MF.getFrameInfo()->hasVarSizedObjects();
+  if (ExpectReservedFrame)
+    return true;
+
----------------
I think I misinterpreted this on the first pass. We always expect this to be profitable if we know we *can't* reserve space for the call frame. Maybe rename the bool to CannotReserveFrame to match the sense?

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:143
@@ +142,3 @@
+
+  // Stack re-alignment can make this unprofitable even in terms of size.
+  // As mentioned above, a better heuristic is needed. For now, don't do this
----------------
mkuper wrote:
> rnk wrote:
> > Can you explain why this is unprofitable? I guess if we get here we are in dyanamic alloca plus stack realignment land, i.e. the worst thing that could possibly happen. Is this about extra code for preserving the outgoing stack alignment then? Like on Linux, where we provide 16 byte stack alignment?
> If we get here, we're in opt-for-size + stack-realignment land.
> 
> And, yes, that's exactly what it is is about.
> If you are passing only one parameter, the original code would be:
> 
> ```
> mov %eax, 128(%esp)
> call $foo
> ```
> 
> Without re-alignment, you have
> ```
> push %eax
> call $foo
> add $4, %esp
> ```
> which is still a win in terms of code-size
> 
> With re-alignment, you get:
> ```
> sub $16, %esp
> push %eax
> call $foo
> add $12, %esp
> ```
> Which is... questionable. The code size for the sequence is the same (in this case, 7 bytes for both, not including the call), but if you have other call sites which you didn't convert, you may actually lose. And, of course, you lose performance (3 instructions instead of 1) without anything to show for it.
> 
> Once there is a heuristic that tries to estimate the overhead, we can address this on a case-by-case basis (e.g. if we have 16-byte stack re-alignment, but most call-sites have a lot of parameters, then it's still worth it.)
Based on my misinterpretation, I think I understand why you get this code. SP is assumed to be aligned coming into the sequence. We realign SP after dynamic allocas. The sequence is probably more like:
```
sub $12, %esp
push %eax
call $foo
add $16, %esp
```
I can see why this is less profitable.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:208
@@ +207,3 @@
+  // no gaps between them.
+  std::map<int64_t, MachineBasicBlock::iterator> MovMap;
+
----------------
std::map is really malloc heavy. This can probably be a SmallVector<MachineInstr*, 8> or something, mapping slot index to the MI that fills it. The frame setup opcode should tell you how much stack space to allocate up front, and you can index into the vector by StackOffset / 4.

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:220-222
@@ +219,5 @@
+    // but rather a frame index.
+    // TODO: Support the fi case. This should probably work now that we
+    // have the infrastructure to track the stack pointer within a call
+    // sequence.
+    if (!I->getOperand(X86::AddrBaseReg).isReg() ||
----------------
This seems worth tackling, given that you had to handle the `call <fi>` case. :)

================
Comment at: lib/Target/X86/X86ConvertMovsToPushes.cpp:364-368
@@ +363,7 @@
+
+  // Make sure the def is a MOV from memory.
+  // If the def is an another block, give up.
+  if (DefMI->getOpcode() != X86::MOV32rm ||
+      DefMI->getParent() != FrameSetup->getParent())
+    return nullptr;
+
----------------
It's not clear to me that same BB is sufficient, consider this potential BB:
```
movl (%edi), %eax
movl $42, (%edi)
<call setup>
movl %eax, (%esp)
calll foo
<call end>
```
We can't move the load if there is a potentially aliasing store in the way. There might be a utility to help with the aliasing query, or you can assume that any stores other than arg stores might alias it and bail on that.

================
Comment at: lib/Target/X86/X86InstrInfo.cpp:1717-1718
@@ +1716,4 @@
+  // pseudo. This leaves us with a choice:
+  // 1) Look for the next ADJCALLSTACKUP, like the code below. This
+  // works, but certainly doesn't belong here.
+  // 2) Factor out the code that makes the decision on whether a function
----------------
This is the best thing I can think of at the moment. =/

================
Comment at: test/CodeGen/X86/movtopush.ll:206
@@ +205,2 @@
+  ret void
+}
----------------
Test case suggestions:
```
; Where the callee is indirect via the stack, `call <fi>`
define void @test10() optsize {
  %stack_fptr = alloca void (i32, i32, i32, i32)*
  store void (i32, i32, i32, i32)* @good, void (i32, i32, i32, i32)** %stack_fptr
  %good_ptr = load void (i32, i32, i32, i32)** %stack_fptr
  call void (i32, i32, i32, i32)* %good_ptr(i32 1, i32 2, i32 3, i32 4)
  ret void
}

; We can't fold the load into the push here, skipping the store.
@the_global = global i32
define void @test11() optsize {
  %myload = load i32* @the_global
  store i32 42, i32* @the_global
  call void @good(i32 %myload, i32 2, i32 3, i32 4)
  ret void
}
```

http://reviews.llvm.org/D6789

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/