[PATCH] D20003: X86CallFrameOpt: a first step towards optimizing inalloca calls (PR27076)

Mon May 9 08:45:46 PDT 2016

DavidKreitzer added a comment.

> The connection I saw between removing the _chkstk calls and this transformation is that the pushes will touch the stack in a safe order (starting at %esp, which should be safe, and progressing downwards), whereas the original stores might be in an order that starts by touching an address beyond the allocated stack.

That's a fair point. You do have to worry about the case where the pushes get prefaced by a "sub esp" which is typically done to pad the outgoing argument block for alignment. I do not know how common that is on IA-32 Windows, but it is a potential concern.

> What worries me about the approach in r262370 is that it doesn't take other stack objects into account (and I think that's why it failed). What if our function has a 3 KB fixed array which hasn't been touched yet, can we then remove a 2 KB _chkstk? And IIUC (but I could be wrong here), we can't rely on checking the size of the stack frame at that stage, because it could increase due to register spills.

The situation you describe is no different than for a call to a non-inalloca function. That is, if the local frame has a 3k object, and the outgoing parameter block size is 2k, we need to call _chkstk. And it is the frame finalization pass that figures this out. IMO, inalloca functions should be handled in the same way. (I admit that isn't quite the same as the approach taken in r262370.) In addition to eliminating the unnecessary _chkstk calls, a fix in frame finalization will let us avoid forcing a frame pointer in routines with inalloca calls. At any rate, I think we need a comprehensive fix. This patch seems like a small band-aid for a gaping wound.

http://reviews.llvm.org/D20003