[PATCH] Add a "probe-stack" attribute
Philip Reames
listmail at philipreames.com
Wed Aug 6 16:17:14 PDT 2014
Overall, I like the direction of the change, but there's a number of details which need adjusted before it's ready for submission. The biggest is that this patch seems to combine a bunch of separate pieces and it would be better to factor this apart into a series of patches which each do one thing. This would make it much easier to review.
================
Comment at: include/llvm/CodeGen/MachineFunction.h:268
@@ -267,1 +267,3 @@
+ /// Should we be probing the stack for the function
+ bool shouldProbeStack();
----------------
Could you define the term "probing the stack" here? This is likely be where an LLVM developer might encounter the concept. One short sentence is fine.
================
Comment at: lib/Target/ARM/ARMFrameLowering.cpp:303
@@ -303,1 +302,3 @@
+ if ((STI.isTargetWindows() || MF.shouldProbeStack())
+ && WindowsRequiresStackProbe(MF, NumBytes)) {
uint32_t NumWords = NumBytes >> 2;
----------------
Seems like you should rename WindowsRequiredStackProbe since this is no longer Windows specific.
================
Comment at: lib/Target/X86/X86FrameLowering.cpp:500
@@ -495,2 +499,3 @@
!IsWin64 && // Win64 has no Red Zone
+ !(UseStackProbe && StackSize > 128) && // No stack probes
!usesTheStack(MF) && // Don't push and pop.
----------------
What is this magic constant? At minimum, comment it.
This looks like it might be an independent bugfix? If so, please submit separately.
================
Comment at: lib/Target/X86/X86FrameLowering.cpp:673
@@ -668,1 +672,3 @@
+ if (NumBytes >= 4096 && UseStackProbe) {
+ assert(!UseRedZone && "The Red Zone is not accounted for in stack probes");
----------------
This section is a large diff that I'm having trouble understanding. If you can split this up into a set of smaller patches, that would help a lot.
================
Comment at: lib/Target/X86/X86FrameLowering.cpp:675
@@ -668,4 +674,3 @@
- if (Is64Bit) {
- if (STI.isTargetCygMing()) {
- StackProbeSymbol = "___chkstk_ms";
+ if (NumBytes <= 0x5000) {
+ BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::SUB64ri32 : X86::SUB32ri),
----------------
Reid Kleckner wrote:
> This change seems like a separable optimization, correct? I'd like to have a separate patch for it.
Magic numbers? What do these mean?
================
Comment at: lib/Target/X86/X86FrameLowering.cpp:708
@@ -674,6 +707,3 @@
+ StackProbeSymbol = "__probestack";
}
----------------
In a separate change, exacting out this code out as a helper function (getStackProbeSymbol(..)) would make the functional diff easier to follow.
================
Comment at: lib/Target/X86/X86FrameLowering.cpp:712
@@ -682,1 +711,3 @@
+ bool isRegAccAlive = isEAXLiveIn(MF);
+ auto RegAcc = Is64Bit ? X86::RAX : X86::EAX;
----------------
I don't understand this section at all. What's the motivation here?
Again, could this be separate patch?
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:18201
@@ -18200,3 +18200,3 @@
- if (Subtarget->isTargetWin64()) {
+ if (Subtarget->isTargetWin64() || !Subtarget->isOSWindows()) {
if (Subtarget->isTargetCygMing()) {
----------------
This condition "feels" wrong. It looks like you're only avoiding this case for 32 bit Windows? Why? The old code didn't.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:18227
@@ +18226,3 @@
+ .addReg(X86::RAX);
+ } else {
+ BuildMI(*BB, MI, DL, TII->get(X86::CALLpcrel32))
----------------
I'm not clear on what this change is doing. In addition to the changed symbol, it looks like you're adding 64 vs 32 bit case? Is this an unrelated bugfix?
================
Comment at: lib/Transforms/IPO/Inliner.cpp:140
@@ -139,1 +139,3 @@
+ if (Callee->hasFnAttribute("probe-stack")) {
+ Caller->addFnAttr("probe-stack", "");
----------------
Add a comment to justify.
Would it be better to just prevent inlining when only one has probe-stack? (i.e. is this expected to ever happen?)
================
Comment at: test/CodeGen/X86/mingw-alloca.ll:32
@@ -33,1 +31,3 @@
+ %A2 = alloca [20000 x i32], align 16 ; <[20000 x i32]*> [#uses=1]
+ %A2.sub = getelementptr [20000 x i32]* %A2, i32 0, i32 0 ; <i32*> [#uses=1]
call void @bar2( i32* %A2.sub, i32 %N )
----------------
Why are you modifying the existing test rather than adding a new one? Is this test incorrect? It doesn't seem to be on first read.
================
Comment at: test/CodeGen/X86/pr17631.ll:21
@@ -20,3 +20,3 @@
; CHECK-NOT: vzeroupper
-; CHECK: _chkstk
+; CHECK: orl $0, 64(%esp)
; CHECK: ret
----------------
This looks like it goes with an optimization?
Can you add a test case for the unoptimized case?
Again, optimizations should be separate patches.
================
Comment at: test/CodeGen/X86/stack-probes.ll:1
@@ +1,2 @@
+; RUN: llc -mtriple=i386-pc-linux-gnu < %s -o - | FileCheck --check-prefix=X86-Linux %s
+; RUN: llc -mtriple=x86_64-pc-linux-gnu < %s -o - | FileCheck --check-prefix=X64-Linux %s
----------------
Why bother with a custom prefix if they are the same? You can just use the default CHECK prefix.
================
Comment at: test/CodeGen/X86/stack-probes.ll:23
@@ +22,3 @@
+
+declare void @use_fast([4096 x i8]*)
+
----------------
A comment describing what each test is checking for would be helpful.
================
Comment at: test/CodeGen/X86/win64_alloca_dynalloca.ll:13
@@ -12,3 +12,3 @@
- %buf0 = alloca i8, i64 4096, align 1
+ %buf0 = alloca i8, i64 40096, align 1
----------------
Why are you modify existing tests rather than adding new ones? Is this because of a new optimization?
http://reviews.llvm.org/D4717
More information about the llvm-commits
mailing list