[PATCH] Add a "probe-stack" attribute

Wed Aug 6 16:17:14 PDT 2014

Overall, I like the direction of the change, but there's a number of details which need adjusted before it's ready for submission.  The biggest is that this patch seems to combine a bunch of separate pieces and it would be better to factor this apart into a series of patches which each do one thing.  This would make it much easier to review.

================
Comment at: include/llvm/CodeGen/MachineFunction.h:268
@@ -267,1 +267,3 @@
 
+  /// Should we be probing the stack for the function
+  bool shouldProbeStack();
----------------
Could you define the term "probing the stack" here?  This is likely be where an LLVM developer might encounter the concept.  One short sentence is fine.  

================
Comment at: lib/Target/ARM/ARMFrameLowering.cpp:303
@@ -303,1 +302,3 @@
+  if ((STI.isTargetWindows() || MF.shouldProbeStack())
+      && WindowsRequiresStackProbe(MF, NumBytes)) {
     uint32_t NumWords = NumBytes >> 2;
----------------
Seems like you should rename WindowsRequiredStackProbe since this is no longer Windows specific.

================
Comment at: lib/Target/X86/X86FrameLowering.cpp:500
@@ -495,2 +499,3 @@
       !IsWin64 &&                                       // Win64 has no Red Zone
+      !(UseStackProbe && StackSize > 128) &&            // No stack probes
       !usesTheStack(MF) &&                              // Don't push and pop.
----------------
What is this magic constant?  At minimum, comment it.

This looks like it might be an independent bugfix?  If so, please submit separately.  

================
Comment at: lib/Target/X86/X86FrameLowering.cpp:673
@@ -668,1 +672,3 @@
+  if (NumBytes >= 4096 && UseStackProbe) {
+    assert(!UseRedZone && "The Red Zone is not accounted for in stack probes");
 
----------------
This section is a large diff that I'm having trouble understanding.  If you can split this up into a set of smaller patches, that would help a lot.  

================
Comment at: lib/Target/X86/X86FrameLowering.cpp:675
@@ -668,4 +674,3 @@
 
-    if (Is64Bit) {
-      if (STI.isTargetCygMing()) {
-        StackProbeSymbol = "___chkstk_ms";
+    if (NumBytes <= 0x5000) {
+      BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::SUB64ri32 : X86::SUB32ri),
----------------
Reid Kleckner wrote:
> This change seems like a separable optimization, correct?  I'd like to have a separate patch for it.
Magic numbers?  What do these mean?

================
Comment at: lib/Target/X86/X86FrameLowering.cpp:708
@@ -674,6 +707,3 @@
+        StackProbeSymbol = "__probestack";
       }
 
----------------
In a separate change, exacting out this code out as a helper function (getStackProbeSymbol(..)) would make the functional diff easier to follow.

================
Comment at: lib/Target/X86/X86FrameLowering.cpp:712
@@ -682,1 +711,3 @@
+      bool isRegAccAlive = isEAXLiveIn(MF);
+      auto RegAcc = Is64Bit ? X86::RAX : X86::EAX;
 
----------------
I don't understand this section at all.  What's the motivation here?

Again, could this be separate patch?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:18201
@@ -18200,3 +18200,3 @@
 
-  if (Subtarget->isTargetWin64()) {
+  if (Subtarget->isTargetWin64() || !Subtarget->isOSWindows()) {
     if (Subtarget->isTargetCygMing()) {
----------------
This condition "feels" wrong.  It looks like you're only avoiding this case for 32 bit Windows?  Why?  The old code didn't.  



================
Comment at: lib/Target/X86/X86ISelLowering.cpp:18227
@@ +18226,3 @@
+          .addReg(X86::RAX);
+      } else {
+        BuildMI(*BB, MI, DL, TII->get(X86::CALLpcrel32))
----------------
I'm not clear on what this change is doing.  In addition to the changed symbol, it looks like you're adding 64 vs 32 bit case?  Is this an unrelated bugfix?

================
Comment at: lib/Transforms/IPO/Inliner.cpp:140
@@ -139,1 +139,3 @@
 
+  if (Callee->hasFnAttribute("probe-stack")) {
+    Caller->addFnAttr("probe-stack", "");
----------------
Add a comment to justify.  

Would it be better to just prevent inlining when only one has probe-stack?  (i.e. is this expected to ever happen?)

================
Comment at: test/CodeGen/X86/mingw-alloca.ll:32
@@ -33,1 +31,3 @@
+	%A2 = alloca [20000 x i32], align 16		; <[20000 x i32]*> [#uses=1]
+	%A2.sub = getelementptr [20000 x i32]* %A2, i32 0, i32 0		; <i32*> [#uses=1]
 	call void @bar2( i32* %A2.sub, i32 %N )
----------------
Why are you modifying the existing test rather than adding a new one?  Is this test incorrect?  It doesn't seem to be on first read.  

================
Comment at: test/CodeGen/X86/pr17631.ll:21
@@ -20,3 +20,3 @@
 ; CHECK-NOT: vzeroupper
-; CHECK: _chkstk
+; CHECK: orl $0, 64(%esp)
 ; CHECK: ret
----------------
This looks like it goes with an optimization?

Can you add a test case for the unoptimized case?

Again, optimizations should be separate patches.

================
Comment at: test/CodeGen/X86/stack-probes.ll:1
@@ +1,2 @@
+; RUN: llc -mtriple=i386-pc-linux-gnu < %s -o - | FileCheck --check-prefix=X86-Linux %s
+; RUN: llc -mtriple=x86_64-pc-linux-gnu < %s -o - | FileCheck --check-prefix=X64-Linux %s
----------------
Why bother with a custom prefix if they are the same?  You can just use the default CHECK prefix.  

================
Comment at: test/CodeGen/X86/stack-probes.ll:23
@@ +22,3 @@
+
+declare void @use_fast([4096 x i8]*)
+
----------------
A comment describing what each test is checking for would be helpful.  

================
Comment at: test/CodeGen/X86/win64_alloca_dynalloca.ll:13
@@ -12,3 +12,3 @@
 
-  %buf0 = alloca i8, i64 4096, align 1
+  %buf0 = alloca i8, i64 40096, align 1
 
----------------
Why are you modify existing tests rather than adding new ones?  Is this because of a new optimization?

http://reviews.llvm.org/D4717