[PATCH] D20263: X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions
Hans Wennborg via llvm-commits
llvm-commits at lists.llvm.org
Mon May 16 16:39:22 PDT 2016
hans marked 18 inline comments as done.
hans added a comment.
Thanks for the great comments everyone!
David, I agree this only addresses a small part of the problem and leaves a lot of performance on the table. The way I think about it is as an incremental improvement, easier than tackling the full inalloca problem all at once, and also helpful for dynamic allocas in general.
Having the WIN_ALLOCA pseudos expanded late might also help getting better code for these calls eventually, as it should remove the need for some of the pattern matching I tried in http://reviews.llvm.org/D20003 (unless we manage to fix them with some higher-level approach).
================
Comment at: include/llvm/CodeGen/MachineFrameInfo.h:282
@@ +281,3 @@
+ /// Whether the function has WIN_ALLOCA instructions.
+ bool HasWinAlloca = false;
+
----------------
rnk wrote:
> I think WIN_ALLOCA is x86-specific, so a better place for this would be lib/Target/X86/X86MachineFunctionInfo.h
Done.
================
Comment at: lib/Target/X86/X86.h:62
@@ -61,1 +61,3 @@
+/// XXX
+FunctionPass *createX86WinAllocaExpander();
----------------
rnk wrote:
> mkuper wrote:
> > I think this needs a description. :-)
> write the comment :)
Done.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:16606
@@ -16609,1 +16605,3 @@
+ unsigned SizeReg = MRI.createVirtualRegister(getRegClassFor(SPTy));
+ Chain = DAG.getCopyToReg(Chain, dl, SizeReg, Size);
SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
----------------
rnk wrote:
> mkuper wrote:
> > Do we still need a CopyToReg? The previous code had it because it needed specifically a copy to EAX glued to the future-chkstk. Can you use Size directly?
> I wonder if we could do something clever to avoid creating a copy of an immediate. Might be too much work.
We can avoid creating the copy here, as Michael pointed out, but since the MI instruction expects a register (because the amount might not be a constant), there will spill be a copy at the MI level.
We could create a variant of the pseudo-instruction that takes an immediate, but I'm not sure that would end up being a simplification.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:50
@@ +49,3 @@
+ /// Compute which lowering to use for each WinAlloca instruction.
+ LoweringMap computeLowerings(MachineFunction &MF);
+
----------------
mkuper wrote:
> DavidKreitzer wrote:
> > Do you really want to return a LoweringMap? This will cause the entire container to be copied. Maybe pass in a LoweringMap& and fill it in instead?
> Won't this be ok with RVO?
I figured the call would be inlined anyway, and the compiler could avoid the copy.
Anyway, I've changed the call to return via a reference parameter instead, as that seems more idiomatic.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:99
@@ +98,3 @@
+X86WinAllocaExpander::getLowering(int64_t CurrentOffset,
+ int64_t AllocaAmount) {
+ // For a non-constant amount or a large amount, we have to probe.
----------------
DavidKreitzer wrote:
> The indentation looks off here.
Done.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:101
@@ +100,3 @@
+ // For a non-constant amount or a large amount, we have to probe.
+ if (AllocaAmount == -1 || AllocaAmount > StackProbeSize)
+ return Probe;
----------------
DavidKreitzer wrote:
> To be extra safe, you could assert that AllocaAmount is >= -1.
Hmm, I'll just change it to check AllocaAmount < 0 in the condition instead.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:109
@@ +108,3 @@
+ // Otherwise, touch the current tip of the stack, then subtract.
+ assert(AllocaAmount <= StackProbeSize);
+ return TouchAndSub;
----------------
DavidKreitzer wrote:
> This assertion seems unnecessary given line 101.
Removed.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:116
@@ +115,3 @@
+ // Do a one-pass reverse post-order walk of the CFG to conservatively estimate
+ // the offset between the stack pointer and the lowerst touched part of the
+ // stack, and use that to decide how to lower each WinAlloca instruction.
----------------
rnk wrote:
> s/lowerst/lowest/
Done.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:119
@@ +118,3 @@
+
+ // Compute the reverse post-order.
+ SmallVector<MachineBasicBlock*, 16> RPO;
----------------
rnk wrote:
> This can be:
> ReversePostOrderTraversal<MachineFunction *> RPOT(&MF);
Done.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:126
@@ +125,3 @@
+ // Initialize Out[B], the stack offset at exit from B, to something big.
+ DenseMap<MachineBasicBlock *, int64_t> Out;
+ for (MachineBasicBlock *MBB : RPO)
----------------
DavidKreitzer wrote:
> "Out" isn't very descriptive. Can you name this differently, e.g. OutOffset?
Done.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:152
@@ +151,3 @@
+ Lowerings[&MI] = L;
+ switch (L) {
+ case Sub:
----------------
DavidKreitzer wrote:
> You should probably add a default: notreached().
I left it out intentionally, relying on -Wswitch to detect any missing cases: http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:166
@@ +165,3 @@
+ // A stackrestore makes the offset unknown.
+ if (MI.isCopy() && MI.getOperand(0).isReg() &&
+ MI.getOperand(0).getReg() == StackPtr)
----------------
mkuper wrote:
> Are you sure this is conservative enough?
> Perhaps blacklist anything that defs the stack pointer, except for a specific whitelist (calls, pushes, pops...) that touches the tip? Or would we get a huge whitelist?
I couldn't come up with any example code that would break this in practice. But you're right, a white-list would be better, and I don't think it will be big.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:168
@@ +167,3 @@
+ MI.getOperand(0).getReg() == StackPtr)
+ Offset = -1;
+ }
----------------
rnk wrote:
> Should this be INT32_MAX instead? It seems like we could do alloca, save+restore, alloca and run over the guard page this way. Maybe a good test case?
Oops, -1 was a leftover from an earlier version of the patch. I've added a test.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:197
@@ +196,3 @@
+ if (Amount != -1) {
+ assert((Amount % (Is64Bit ? 8 : 4) == 0) && "Stack would not be aligned!");
+ }
----------------
mkuper wrote:
> rnk wrote:
> > Maybe cache `Is64Bit ? 8 : 4` as a `SlotSize` member variable. Also RAX as... some variable. RegA?
> We can probably just query MRI for getSlotSize instead.
>
Done.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:197
@@ +196,3 @@
+ if (Amount != -1) {
+ assert((Amount % (Is64Bit ? 8 : 4) == 0) && "Stack would not be aligned!");
+ }
----------------
hans wrote:
> mkuper wrote:
> > rnk wrote:
> > > Maybe cache `Is64Bit ? 8 : 4` as a `SlotSize` member variable. Also RAX as... some variable. RegA?
> > We can probably just query MRI for getSlotSize instead.
> >
> Done.
Done.
================
Comment at: lib/Target/X86/X86WinAllocaExpander.cpp:263
@@ +262,3 @@
+ if (MF.getFunction()->hasFnAttribute("stack-probe-size")) {
+ MF.getFunction()
+ ->getFnAttribute("stack-probe-size")
----------------
mkuper wrote:
> StackProbeSize = MF.... ?
> (Perhaps add a test for this?)
It's passed as an argument to .getAsInteger below. Or are you saying there's some shorter way to get at it?
Added a test.
http://reviews.llvm.org/D20263
More information about the llvm-commits
mailing list