[PATCH] ARM and Thumb Segmented Stacks

Alex Crichton alex at crichton.co
Tue Feb 25 13:13:25 PST 2014


Thank you for taking a look!

I've attached a new version of the patch, plus some additional comments below.

> I'm particularly concerned about two things:
> + The separate ARM & Thumb implementations.

I have merged the two. I found that most of the instructions were
subtly different, but some of them did only differ in opcodes. If you
find a few more that can be merged, though, please let me know!

> +  // Get TLS base address.
> +  // First try to get it from the coprocessor
> [...]
> +  // Next, try to get it from the special address 0xFFFF0FF0
>
> This *really* shouldn't be a dynamic check. Your platform should
> define where to get the TLS base from, and not in terms of "are you
> feeling lucky, punk". Currently, I believe LLVM uses the mrc form in
> all cases.

As with the rest of this code, I sadly don't consider myself an expert
in this area. I did some digging around, and was able to come up with
some information about this. A comment [1] on one of our issues is
particularly enlightening, which points to bionic's implementation
[2]. Apparently some implementations will have the TLS table in the
coprocessor, some have it as the return value of a function call, and
others have it in this magic address.

It looks like bionic favors using a coprocessor, which sounds in line
with what you're saying LLVM does. I have removed the check at the
special address and left it to only check the coprocessor now.

> +  // push {lr} - Save return address of this function.
> +  AddDefaultPred(BuildMI(allocMBB, DL, TII.get(ARM::STMDB_UPD))
> +                 .addReg(ARM::SP, RegState::Define)
> +                 .addReg(ARM::SP))
> +    .addReg(ARM::LR)
>
> This misaligns the stack, which needs to be a multiple of 8 across
> public interfaces according to AAPCS. Is __morestack defined to take
> precautions for that?

On x86 at least the __morestack function is invoked with an unaligned
stack. I wouldn't consider myself an expert on this function, but all
examples I've seen write __morestack in assembly, so I don't think
it's meant to be written in something like C (as in, I don't think it
expects an aligned stack).

[1] - https://github.com/mozilla/rust/issues/4489#issuecomment-12347721
[2] - https://code.google.com/p/android-source-browsing/source/browse/libc/private/bionic_tls.h?repo=platform--bionic&name=ics-mr1
-------------- next part --------------
diff --git a/lib/Target/ARM/ARMFrameLowering.cpp b/lib/Target/ARM/ARMFrameLowering.cpp
index 9164178..6d1c17c 100644
--- a/lib/Target/ARM/ARMFrameLowering.cpp
+++ b/lib/Target/ARM/ARMFrameLowering.cpp
@@ -14,6 +14,7 @@
 #include "ARMFrameLowering.h"
 #include "ARMBaseInstrInfo.h"
 #include "ARMBaseRegisterInfo.h"
+#include "ARMConstantPoolValue.h"
 #include "ARMMachineFunctionInfo.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
@@ -1603,3 +1604,299 @@ eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
   MBB.erase(I);
 }
 
+/// Get the minimum constant for ARM that is greater than or equal to the
+/// argument. In ARM, constants can have any value that can be produced by
+/// rotating an 8-bit value to the right by an even number of bits within a
+/// 32-bit word.
+static uint32_t alignToARMConstant(uint32_t Value) {
+  unsigned Shifted = 0;
+
+  if (Value == 0)
+      return 0;
+
+  while (!(Value & 0xC0000000)) {
+      Value = Value << 2;
+      Shifted += 2;
+  }
+
+  bool Carry = (Value & 0x00FFFFFF);
+  Value = ((Value & 0xFF000000) >> 24) + Carry;
+
+  if (Value & 0x0000100)
+      Value = Value & 0x000001FC;
+
+  if (Shifted > 24)
+      Value = Value >> (Shifted - 24);
+  else
+      Value = Value << (24 - Shifted);
+
+  return Value;
+}
+
+// The stack limit in the TCB is set to this many bytes above the actual
+// stack limit.
+static const uint64_t kSplitStackAvailable = 256;
+
+// Adjust the function prologue to enable split stacks.
+// This currently only supports android and linux.
+void
+ARMFrameLowering::adjustForSegmentedStacks(MachineFunction &MF) const {
+  unsigned Opcode;
+  const ARMSubtarget *ST = &MF.getTarget().getSubtarget<ARMSubtarget>();
+  bool Thumb = ST->isThumb();
+
+  // Doesn't support vararg function.
+  if (MF.getFunction()->isVarArg())
+    report_fatal_error("Segmented stacks do not support vararg functions.");
+  if (!Thumb && !ST->isTargetAndroid() && !ST->isTargetLinux())
+    report_fatal_error("Segmented stacks not supported on this platfrom.");
+
+  MachineBasicBlock &prologueMBB = MF.front();
+  MachineFrameInfo* MFI = MF.getFrameInfo();
+  const ARMBaseInstrInfo &TII =
+    *static_cast<const ARMBaseInstrInfo*>(MF.getTarget().getInstrInfo());
+  ARMFunctionInfo* ARMFI = MF.getInfo<ARMFunctionInfo>();
+  DebugLoc DL;
+
+  // Use R4 and R5 as scratch registers.
+  // We save R4 and R5 before use and restore them before leaving the function.
+  unsigned ScratchReg0 = ARM::R4;
+  unsigned ScratchReg1 = ARM::R5;
+  uint64_t AlignedStackSize;
+
+  MachineBasicBlock* PrevStackMBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock* PostStackMBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock* AllocMBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock* GetMBB = MF.CreateMachineBasicBlock();
+  MachineBasicBlock* McrMBB = MF.CreateMachineBasicBlock();
+
+  for (MachineBasicBlock::livein_iterator i = prologueMBB.livein_begin(),
+       e = prologueMBB.livein_end(); i != e; ++i) {
+    AllocMBB->addLiveIn(*i);
+    GetMBB->addLiveIn(*i);
+    McrMBB->addLiveIn(*i);
+    PrevStackMBB->addLiveIn(*i);
+    PostStackMBB->addLiveIn(*i);
+  }
+
+  MF.push_front(PostStackMBB);
+  MF.push_front(AllocMBB);
+  MF.push_front(GetMBB);
+  MF.push_front(McrMBB);
+  MF.push_front(PrevStackMBB);
+
+  // The required stack size that is aligned to ARM constant criterion.
+  uint64_t StackSize = MFI->getStackSize();
+
+  AlignedStackSize = alignToARMConstant(StackSize);
+
+  // When the frame size is less than 256 we just compare the stack
+  // boundary directly to the value of the stack pointer, per gcc.
+  bool CompareStackPointer = AlignedStackSize < kSplitStackAvailable;
+
+  // We will use two of the callee save registers as scratch registers so we
+  // need to save those registers onto the stack.
+  // We will use SR0 to hold stack limit and SR1 to hold the stack size
+  // requested and arguments for __morestack().
+  // SR0: Scratch Register #0
+  // SR1: Scratch Register #1
+  // push {SR0, SR1}
+  if (Thumb) {
+    AddDefaultPred(BuildMI(PrevStackMBB, DL, TII.get(ARM::tPUSH)))
+      .addReg(ScratchReg0)
+      .addReg(ScratchReg1);
+  } else {
+    AddDefaultPred(BuildMI(PrevStackMBB, DL, TII.get(ARM::STMDB_UPD))
+                   .addReg(ARM::SP, RegState::Define)
+                   .addReg(ARM::SP))
+      .addReg(ScratchReg0)
+      .addReg(ScratchReg1);
+  }
+
+  // mov SR1, sp
+  if (Thumb) {
+    AddDefaultPred(BuildMI(McrMBB, DL, TII.get(ARM::tMOVr), ScratchReg1)
+                   .addReg(ARM::SP));
+  } else if (CompareStackPointer) {
+    AddDefaultPred(BuildMI(McrMBB, DL, TII.get(ARM::MOVr), ScratchReg1)
+                   .addReg(ARM::SP)).addReg(0);
+  }
+
+  // sub SR1, sp, #StackSize
+  if (!CompareStackPointer && Thumb) {
+    AddDefaultPred(AddDefaultCC(BuildMI(McrMBB, DL, TII.get(ARM::tSUBi8), ScratchReg1))
+                   .addReg(ScratchReg1).addImm(AlignedStackSize));
+  } else if (!CompareStackPointer) {
+    AddDefaultPred(BuildMI(McrMBB, DL, TII.get(ARM::SUBri), ScratchReg1)
+                   .addReg(ARM::SP).addImm(AlignedStackSize)).addReg(0);
+  }
+
+  if (Thumb) {
+    unsigned PCLabelId = ARMFI->createPICLabelUId();
+    ARMConstantPoolValue *NewCPV = ARMConstantPoolSymbol::
+      Create(MF.getFunction()->getContext(), "STACK_LIMIT", PCLabelId, 0);
+    MachineConstantPool *MCP = MF.getConstantPool();
+    unsigned CPI = MCP->getConstantPoolIndex(NewCPV, MF.getAlignment());
+
+    // ldr SR0, [pc, offset(STACK_LIMIT)]
+    AddDefaultPred(BuildMI(GetMBB, DL, TII.get(ARM::tLDRpci), ScratchReg0)
+                   .addConstantPoolIndex(CPI));
+
+    // ldr SR0, [SR0]
+    AddDefaultPred(BuildMI(GetMBB, DL, TII.get(ARM::tLDRi), ScratchReg0)
+                  .addReg(ScratchReg0)
+                  .addImm(0));
+  } else {
+    // Get TLS base address.
+    // First try to get it from the coprocessor
+    // mrc p15, #0, SR0, c13, c0, #3
+    AddDefaultPred(BuildMI(McrMBB, DL, TII.get(ARM::MRC), ScratchReg0)
+                   .addImm(15)
+                   .addImm(0)
+                   .addImm(13)
+                   .addImm(0)
+                   .addImm(3));
+
+    // Use the last tls slot on android and a private field of the TCP on linux.
+    assert(ST->isTargetAndroid() || ST->isTargetLinux());
+    unsigned TlsOffset = ST->isTargetAndroid() ? 63 : 1;
+    // add SR0, SR0, offset*4
+    // AddDefaultPred(BuildMI(GetMBB, DL, TII.get(ARM::ADDri), ScratchReg0)
+    //                .addReg(ScratchReg0).addImm(4*TlsOffset)).addReg(0);
+
+    // Get the stack limit from the right offset
+    // ldr SR0, [sr0, #4 * TlsOffset]
+    AddDefaultPred(BuildMI(GetMBB, DL, TII.get(ARM::LDRi12), ScratchReg0)
+                   .addReg(ScratchReg0).addImm(4 * TlsOffset));
+  }
+
+  // Compare stack limit with stack size requested.
+  // cmp SR0, SR1
+  Opcode = Thumb ? ARM::tCMPr : ARM::CMPrr;
+  AddDefaultPred(BuildMI(GetMBB, DL, TII.get(Opcode))
+                 .addReg(ScratchReg0)
+                 .addReg(ScratchReg1));
+
+  // This jump is taken if StackLimit < SP - stack required.
+  Opcode = Thumb ? ARM::tBcc : ARM::Bcc;
+  BuildMI(GetMBB, DL, TII.get(Opcode)).addMBB(PostStackMBB)
+    .addImm(ARMCC::LO)
+    .addReg(ARM::CPSR);
+
+
+  // Calling __morestack(StackSize, Size of stack arguments).
+  // __morestack knows that the stack size requested is in SR0(r4)
+  // and amount size of stack arguments is in SR1(r5).
+
+  // Pass first argument for the __morestack by Scratch Register #0.
+  //   The amount size of stack required
+  if (Thumb) {
+    AddDefaultPred(AddDefaultCC(BuildMI(AllocMBB, DL, TII.get(ARM::tMOVi8), ScratchReg0))
+                   .addImm(AlignedStackSize));
+  } else {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::MOVi), ScratchReg0)
+                   .addImm(AlignedStackSize)).addReg(0);
+  }
+  // Pass second argument for the __morestack by Scratch Register #1.
+  //   The amount size of stack consumed to save function arguments.
+  if (Thumb) {
+    AddDefaultPred(AddDefaultCC(BuildMI(AllocMBB, DL, TII.get(ARM::tMOVi8), ScratchReg1))
+                   .addImm(alignToARMConstant(ARMFI->getArgumentStackSize())));
+  } else {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::MOVi), ScratchReg1)
+                   .addImm(alignToARMConstant(ARMFI->getArgumentStackSize())))
+                   .addReg(0);
+  }
+
+  // push {lr} - Save return address of this function.
+  if (Thumb) {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::tPUSH)))
+      .addReg(ARM::LR);
+  } else {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::STMDB_UPD))
+                   .addReg(ARM::SP, RegState::Define)
+                   .addReg(ARM::SP))
+      .addReg(ARM::LR);
+  }
+
+  // Call __morestack().
+  if (Thumb) {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::tBL)))
+      .addExternalSymbol("__morestack");
+  } else {
+    BuildMI(AllocMBB, DL, TII.get(ARM::BL))
+      .addExternalSymbol("__morestack");
+  }
+
+  // Restore return address of this original function.
+  if (Thumb) {
+    // pop {SR0}
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::tPOP)))
+      .addReg(ScratchReg0);
+
+    // mov lr, SR0
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::tMOVr), ARM::LR)
+                   .addReg(ScratchReg0));
+  } else {
+    // pop {lr}
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::LDMIA_UPD))
+                   .addReg(ARM::SP, RegState::Define)
+                   .addReg(ARM::SP))
+      .addReg(ARM::LR);
+  }
+
+  // Restore SR0 and SR1 in case of __morestack() was called.
+  // __morestack() will skip PostStackMBB block so we need to restore
+  // scratch registers from here.
+  // pop {SR0, SR1}
+  if (Thumb) {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::tPOP)))
+      .addReg(ScratchReg0)
+      .addReg(ScratchReg1);
+  } else {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::LDMIA_UPD))
+                   .addReg(ARM::SP, RegState::Define)
+                   .addReg(ARM::SP))
+      .addReg(ScratchReg0)
+      .addReg(ScratchReg1);
+  }
+
+  // Return from this function.
+  if (Thumb) {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::tMOVr), ARM::PC)
+                   .addReg(ARM::LR));
+  } else {
+    AddDefaultPred(BuildMI(AllocMBB, DL, TII.get(ARM::MOVr), ARM::PC)
+                   .addReg(ARM::LR)).addReg(0);
+  }
+
+  // Restore SR0 and SR1 in case of __morestack() was not called.
+  // pop {SR0, SR1}
+  if (Thumb) {
+    AddDefaultPred(BuildMI(PostStackMBB, DL, TII.get(ARM::tPOP)))
+      .addReg(ScratchReg0)
+      .addReg(ScratchReg1);
+  } else {
+    AddDefaultPred(BuildMI(PostStackMBB, DL, TII.get(ARM::LDMIA_UPD))
+                   .addReg(ARM::SP, RegState::Define)
+                   .addReg(ARM::SP))
+      .addReg(ScratchReg0)
+      .addReg(ScratchReg1);
+  }
+
+  // Organizing MBB lists
+  PostStackMBB->addSuccessor(&prologueMBB);
+
+  AllocMBB->addSuccessor(PostStackMBB);
+
+  GetMBB->addSuccessor(PostStackMBB);
+  GetMBB->addSuccessor(AllocMBB);
+
+  McrMBB->addSuccessor(GetMBB);
+
+  PrevStackMBB->addSuccessor(McrMBB);
+
+#ifdef XDEBUG
+  MF.verify();
+#endif
+}
diff --git a/lib/Target/ARM/ARMFrameLowering.h b/lib/Target/ARM/ARMFrameLowering.h
index efa255a..cc84815 100644
--- a/lib/Target/ARM/ARMFrameLowering.h
+++ b/lib/Target/ARM/ARMFrameLowering.h
@@ -59,6 +59,8 @@ public:
   void processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
                                             RegScavenger *RS) const;
 
+  void adjustForSegmentedStacks(MachineFunction &MF) const;
+
  private:
   void emitPushInst(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
                     const std::vector<CalleeSavedInfo> &CSI, unsigned StmOpc,
diff --git a/lib/Target/ARM/ARMISelLowering.cpp b/lib/Target/ARM/ARMISelLowering.cpp
index abf229b..bfb6414 100644
--- a/lib/Target/ARM/ARMISelLowering.cpp
+++ b/lib/Target/ARM/ARMISelLowering.cpp
@@ -3060,6 +3060,8 @@ ARMTargetLowering::LowerFormalArguments(SDValue Chain,
     VarArgStyleRegisters(CCInfo, DAG, dl, Chain,
                          CCInfo.getNextStackOffset());
 
+  AFI->setArgumentStackSize(CCInfo.getNextStackOffset());
+
   return Chain;
 }
 
diff --git a/lib/Target/ARM/ARMMachineFunctionInfo.h b/lib/Target/ARM/ARMMachineFunctionInfo.h
index 216430b..d7ec6eb 100644
--- a/lib/Target/ARM/ARMMachineFunctionInfo.h
+++ b/lib/Target/ARM/ARMMachineFunctionInfo.h
@@ -114,6 +114,10 @@ class ARMFunctionInfo : public MachineFunctionInfo {
   /// relocation models.
   unsigned GlobalBaseReg;
 
+  /// ArgumentStackSize - amount of bytes on stack consumed by the arguments
+  /// being passed on the stack
+  unsigned ArgumentStackSize;
+
 public:
   ARMFunctionInfo() :
     isThumb(false),
@@ -182,6 +186,9 @@ public:
   void setGPRCalleeSavedArea2Size(unsigned s) { GPRCS2Size = s; }
   void setDPRCalleeSavedAreaSize(unsigned s)  { DPRCSSize = s; }
 
+  unsigned getArgumentStackSize() const { return ArgumentStackSize; }
+  void setArgumentStackSize(unsigned size) { ArgumentStackSize = size; }
+
   unsigned createJumpTableUId() {
     return JumpTableUId++;
   }
diff --git a/lib/Target/ARM/ARMSubtarget.h b/lib/Target/ARM/ARMSubtarget.h
index 1d80d1b..fc00d0f 100644
--- a/lib/Target/ARM/ARMSubtarget.h
+++ b/lib/Target/ARM/ARMSubtarget.h
@@ -346,6 +346,9 @@ public:
     return TargetTriple.getEnvironment() == Triple::GNUEABIHF ||
            TargetTriple.getEnvironment() == Triple::EABIHF;
   }
+  bool isTargetAndroid() const {
+    return TargetTriple.getEnvironment() == Triple::Android;
+  }
 
   bool isAPCS_ABI() const {
     assert(TargetABI != ARM_ABI_UNKNOWN);
diff --git a/test/CodeGen/ARM/segmented-stacks-dynamic.ll b/test/CodeGen/ARM/segmented-stacks-dynamic.ll
new file mode 100644
index 0000000..b2c7c66
--- /dev/null
+++ b/test/CodeGen/ARM/segmented-stacks-dynamic.ll
@@ -0,0 +1,62 @@
+; RUN: llc < %s -mtriple=arm-linux-androideabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=ARM-android
+; RUN: llc < %s -mtriple=arm-linux-unknown-gnueabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=ARM-linux
+; RUN: llc < %s -mtriple=arm-linux-androideabi -segmented-stacks -filetype=obj
+; RUN: llc < %s -mtriple=arm-linux-unknown-gnueabi -segmented-stacks -filetype=obj
+
+; Just to prevent the alloca from being optimized away
+declare void @dummy_use(i32*, i32)
+
+define i32 @test_basic(i32 %l) {
+        %mem = alloca i32, i32 %l
+        call void @dummy_use (i32* %mem, i32 %l)
+        %terminate = icmp eq i32 %l, 0
+        br i1 %terminate, label %true, label %false
+
+true:
+        ret i32 0
+
+false:
+        %newlen = sub i32 %l, 1
+        %retvalue = call i32 @test_basic(i32 %newlen)
+        ret i32 %retvalue
+
+; ARM-linux:      test_basic:
+
+; ARM-linux:      push    {r4, r5}
+; ARM-linux-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-linux-NEXT: mov     r5, sp
+; ARM-linux-NEXT: ldr     r4, [r4, #4]
+; ARM-linux-NEXT: cmp     r4, r5
+; ARM-linux-NEXT: blo     .LBB0_2
+
+; ARM-linux:      mov     r4, #24
+; ARM-linux-NEXT: mov     r5, #0
+; ARM-linux-NEXT: stmdb   sp!, {lr}
+; ARM-linux-NEXT: bl      __morestack
+; ARM-linux-NEXT: ldm     sp!, {lr}
+; ARM-linux-NEXT: pop     {r4, r5}
+; ARM-linux-NEXT: mov     pc, lr
+
+; ARM-linux:      pop     {r4, r5}
+
+
+; ARM-android:      test_basic:
+
+; ARM-android:      push    {r4, r5}
+; ARM-android-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-android-NEXT: mov     r5, sp
+; ARM-android-NEXT: ldr     r4, [r4, #252]
+; ARM-android-NEXT: cmp     r4, r5
+; ARM-android-NEXT: blo     .LBB0_2
+
+; ARM-android:      mov     r4, #24
+; ARM-android-NEXT: mov     r5, #0
+; ARM-android-NEXT: stmdb   sp!, {lr}
+; ARM-android-NEXT: bl      __morestack
+; ARM-android-NEXT: ldm     sp!, {lr}
+; ARM-android-NEXT: pop     {r4, r5}
+; ARM-android-NEXT: mov     pc, lr
+
+; ARM-android:      pop     {r4, r5}
+
+}
diff --git a/test/CodeGen/ARM/segmented-stacks.ll b/test/CodeGen/ARM/segmented-stacks.ll
new file mode 100644
index 0000000..cf60492
--- /dev/null
+++ b/test/CodeGen/ARM/segmented-stacks.ll
@@ -0,0 +1,235 @@
+; RUN: llc < %s -mtriple=arm-linux-androideabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=ARM-android
+; RUN: llc < %s -mtriple=arm-linux-unknown-gnueabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=ARM-linux
+
+; We used to crash with filetype=obj
+; RUN: llc < %s -mtriple=arm-linux-androideabi -segmented-stacks -filetype=obj
+; RUN: llc < %s -mtriple=arm-linux-unknown-gnueabi -segmented-stacks -filetype=obj
+
+
+; Just to prevent the alloca from being optimized away
+declare void @dummy_use(i32*, i32)
+
+define void @test_basic() {
+        %mem = alloca i32, i32 10
+        call void @dummy_use (i32* %mem, i32 10)
+	ret void
+
+; ARM-linux:      test_basic:
+
+; ARM-linux:      push    {r4, r5}
+; ARM-linux-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-linux-NEXT: mov     r5, sp
+; ARM-linux-NEXT: ldr     r4, [r4, #4]
+; ARM-linux-NEXT: cmp     r4, r5
+; ARM-linux-NEXT: blo     .LBB0_2
+
+; ARM-linux:      mov     r4, #48
+; ARM-linux-NEXT: mov     r5, #0
+; ARM-linux-NEXT: stmdb   sp!, {lr}
+; ARM-linux-NEXT: bl      __morestack
+; ARM-linux-NEXT: ldm     sp!, {lr}
+; ARM-linux-NEXT: pop     {r4, r5}
+; ARM-linux-NEXT: mov     pc, lr
+
+; ARM-linux:      pop     {r4, r5}
+
+; ARM-android:      test_basic:
+
+; ARM-android:      push    {r4, r5}
+; ARM-android-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-android-NEXT: mov     r5, sp
+; ARM-android-NEXT: ldr     r4, [r4, #252]
+; ARM-android-NEXT: cmp     r4, r5
+; ARM-android-NEXT: blo     .LBB0_2
+
+; ARM-android:      mov     r4, #48
+; ARM-android-NEXT: mov     r5, #0
+; ARM-android-NEXT: stmdb   sp!, {lr}
+; ARM-android-NEXT: bl      __morestack
+; ARM-android-NEXT: ldm     sp!, {lr}
+; ARM-android-NEXT: pop     {r4, r5}
+; ARM-android-NEXT: mov     pc, lr
+
+; ARM-android:      pop     {r4, r5}
+
+}
+
+define i32 @test_nested(i32 * nest %closure, i32 %other) {
+       %addend = load i32 * %closure
+       %result = add i32 %other, %addend
+       ret i32 %result
+
+; ARM-linux:      test_nested:
+
+; ARM-linux:      push    {r4, r5}
+; ARM-linux-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-linux-NEXT: mov     r5, sp
+; ARM-linux-NEXT: ldr     r4, [r4, #4]
+; ARM-linux-NEXT: cmp     r4, r5
+; ARM-linux-NEXT: blo     .LBB1_2
+
+; ARM-linux:      mov     r4, #0
+; ARM-linux-NEXT: mov     r5, #0
+; ARM-linux-NEXT: stmdb   sp!, {lr}
+; ARM-linux-NEXT: bl      __morestack
+; ARM-linux-NEXT: ldm     sp!, {lr}
+; ARM-linux-NEXT: pop     {r4, r5}
+; ARM-linux-NEXT: mov     pc, lr
+
+; ARM-linux:      pop     {r4, r5}
+
+; ARM-android:      test_nested:
+
+; ARM-android:      push    {r4, r5}
+; ARM-android-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-android-NEXT: mov     r5, sp
+; ARM-android-NEXT: ldr     r4, [r4, #252]
+; ARM-android-NEXT: cmp     r4, r5
+; ARM-android-NEXT: blo     .LBB1_2
+
+; ARM-android:      mov     r4, #0
+; ARM-android-NEXT: mov     r5, #0
+; ARM-android-NEXT: stmdb   sp!, {lr}
+; ARM-android-NEXT: bl      __morestack
+; ARM-android-NEXT: ldm     sp!, {lr}
+; ARM-android-NEXT: pop     {r4, r5}
+; ARM-android-NEXT: mov     pc, lr
+
+; ARM-android:      pop     {r4, r5}
+
+}
+
+define void @test_large() {
+        %mem = alloca i32, i32 10000
+        call void @dummy_use (i32* %mem, i32 0)
+        ret void
+
+; ARM-linux:      test_large:
+
+; ARM-linux:      push    {r4, r5}
+; ARM-linux-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-linux-NEXT: sub     r5, sp, #40192
+; ARM-linux-NEXT: ldr     r4, [r4, #4]
+; ARM-linux-NEXT: cmp     r4, r5
+; ARM-linux-NEXT: blo     .LBB2_2
+
+; ARM-linux:      mov     r4, #40192
+; ARM-linux-NEXT: mov     r5, #0
+; ARM-linux-NEXT: stmdb   sp!, {lr}
+; ARM-linux-NEXT: bl      __morestack
+; ARM-linux-NEXT: ldm     sp!, {lr}
+; ARM-linux-NEXT: pop     {r4, r5}
+; ARM-linux-NEXT: mov     pc, lr
+
+; ARM-linux:      pop     {r4, r5}
+
+; ARM-android:      test_large:
+
+; ARM-android:      push    {r4, r5}
+; ARM-android-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-android-NEXT: sub     r5, sp, #40192
+; ARM-android-NEXT: ldr     r4, [r4, #252]
+; ARM-android-NEXT: cmp     r4, r5
+; ARM-android-NEXT: blo     .LBB2_2
+
+; ARM-android:      mov     r4, #40192
+; ARM-android-NEXT: mov     r5, #0
+; ARM-android-NEXT: stmdb   sp!, {lr}
+; ARM-android-NEXT: bl      __morestack
+; ARM-android-NEXT: ldm     sp!, {lr}
+; ARM-android-NEXT: pop     {r4, r5}
+; ARM-android-NEXT: mov     pc, lr
+
+; ARM-android:      pop     {r4, r5}
+
+}
+
+define fastcc void @test_fastcc() {
+        %mem = alloca i32, i32 10
+        call void @dummy_use (i32* %mem, i32 10)
+        ret void
+
+; ARM-linux:      test_fastcc:
+
+; ARM-linux:      push    {r4, r5}
+; ARM-linux-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-linux-NEXT: mov     r5, sp
+; ARM-linux-NEXT: ldr     r4, [r4, #4]
+; ARM-linux-NEXT: cmp     r4, r5
+; ARM-linux-NEXT: blo     .LBB3_2
+
+; ARM-linux:      mov     r4, #48
+; ARM-linux-NEXT: mov     r5, #0
+; ARM-linux-NEXT: stmdb   sp!, {lr}
+; ARM-linux-NEXT: bl      __morestack
+; ARM-linux-NEXT: ldm     sp!, {lr}
+; ARM-linux-NEXT: pop     {r4, r5}
+; ARM-linux-NEXT: mov     pc, lr
+
+; ARM-linux:      pop     {r4, r5}
+
+; ARM-android:      test_fastcc:
+
+; ARM-android:      push    {r4, r5}
+; ARM-android-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-android-NEXT: mov     r5, sp
+; ARM-android-NEXT: ldr     r4, [r4, #252]
+; ARM-android-NEXT: cmp     r4, r5
+; ARM-android-NEXT: blo     .LBB3_2
+
+; ARM-android:      mov     r4, #48
+; ARM-android-NEXT: mov     r5, #0
+; ARM-android-NEXT: stmdb   sp!, {lr}
+; ARM-android-NEXT: bl      __morestack
+; ARM-android-NEXT: ldm     sp!, {lr}
+; ARM-android-NEXT: pop     {r4, r5}
+; ARM-android-NEXT: mov     pc, lr
+
+; ARM-android:      pop     {r4, r5}
+
+}
+
+define fastcc void @test_fastcc_large() {
+        %mem = alloca i32, i32 10000
+        call void @dummy_use (i32* %mem, i32 0)
+        ret void
+
+; ARM-linux:      test_fastcc_large:
+
+; ARM-linux:      push    {r4, r5}
+; ARM-linux-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-linux-NEXT: sub     r5, sp, #40192
+; ARM-linux-NEXT: ldr     r4, [r4, #4]
+; ARM-linux-NEXT: cmp     r4, r5
+; ARM-linux-NEXT: blo     .LBB4_2
+
+; ARM-linux:      mov     r4, #40192
+; ARM-linux-NEXT: mov     r5, #0
+; ARM-linux-NEXT: stmdb   sp!, {lr}
+; ARM-linux-NEXT: bl      __morestack
+; ARM-linux-NEXT: ldm     sp!, {lr}
+; ARM-linux-NEXT: pop     {r4, r5}
+; ARM-linux-NEXT: mov     pc, lr
+
+; ARM-linux:      pop     {r4, r5}
+
+; ARM-android:      test_fastcc_large:
+
+; ARM-android:      push    {r4, r5}
+; ARM-android-NEXT: mrc     p15, #0, r4, c13, c0, #3
+; ARM-android-NEXT: sub     r5, sp, #40192
+; ARM-android-NEXT: ldr     r4, [r4, #252]
+; ARM-android-NEXT: cmp     r4, r5
+; ARM-android-NEXT: blo     .LBB4_2
+
+; ARM-android:      mov     r4, #40192
+; ARM-android-NEXT: mov     r5, #0
+; ARM-android-NEXT: stmdb   sp!, {lr}
+; ARM-android-NEXT: bl      __morestack
+; ARM-android-NEXT: ldm     sp!, {lr}
+; ARM-android-NEXT: pop     {r4, r5}
+; ARM-android-NEXT: mov     pc, lr
+
+; ARM-android:      pop     {r4, r5}
+
+}
diff --git a/test/CodeGen/Thumb/segmented-stacks-dynamic.ll b/test/CodeGen/Thumb/segmented-stacks-dynamic.ll
new file mode 100644
index 0000000..6e61cdf
--- /dev/null
+++ b/test/CodeGen/Thumb/segmented-stacks-dynamic.ll
@@ -0,0 +1,63 @@
+; RUN: llc < %s -mtriple=thumb-linux-unknown-gnueabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=Thumb-linux
+; RUN: llc < %s -mtriple=thumb-linux-androideabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=Thumb-android
+; RUN: llc < %s -mtriple=thumb-linux-unknown-gnueabi -segmented-stacks -filetype=obj
+; RUN: llc < %s -mtriple=thumb-linux-androideabi -segmented-stacks -filetype=obj
+
+; Just to prevent the alloca from being optimized away
+declare void @dummy_use(i32*, i32)
+
+define i32 @test_basic(i32 %l) {
+        %mem = alloca i32, i32 %l
+        call void @dummy_use (i32* %mem, i32 %l)
+        %terminate = icmp eq i32 %l, 0
+        br i1 %terminate, label %true, label %false
+
+true:
+        ret i32 0
+
+false:
+        %newlen = sub i32 %l, 1
+        %retvalue = call i32 @test_basic(i32 %newlen)
+        ret i32 %retvalue
+
+; Thumb-linux:      test_basic:
+
+; Thumb-linux:      push {r4, r5}
+; Thumb-linux:      mov	r5, sp
+; Thumb-linux-NEXT: ldr r4, .LCPI0_0
+; Thumb-linux-NEXT: ldr r4, [r4]
+; Thumb-linux-NEXT: cmp	r4, r5
+; Thumb-linux-NEXT: blo	.LBB0_2
+
+; Thumb-linux:      mov r4, #16
+; Thumb-linux-NEXT: mov r5, #0
+; Thumb-linux-NEXT: push {lr}
+; Thumb-linux-NEXT: bl	__morestack
+; Thumb-linux-NEXT: pop {r4}
+; Thumb-linux-NEXT: mov lr, r4
+; Thumb-linux-NEXT: pop	{r4, r5}
+; Thumb-linux-NEXT: mov	pc, lr
+
+; Thumb-linux:      pop	{r4, r5}
+
+; Thumb-android:      test_basic:
+
+; Thumb-android:      push {r4, r5}
+; Thumb-android:      mov	r5, sp
+; Thumb-android-NEXT: ldr r4, .LCPI0_0
+; Thumb-android-NEXT: ldr r4, [r4]
+; Thumb-android-NEXT: cmp	r4, r5
+; Thumb-android-NEXT: blo	.LBB0_2
+
+; Thumb-android:      mov r4, #16
+; Thumb-android-NEXT: mov r5, #0
+; Thumb-android-NEXT: push {lr}
+; Thumb-android-NEXT: bl	__morestack
+; Thumb-android-NEXT: pop {r4}
+; Thumb-android-NEXT: mov lr, r4
+; Thumb-android-NEXT: pop	{r4, r5}
+; Thumb-android-NEXT: mov	pc, lr
+
+; Thumb-android:      pop	{r4, r5}
+
+}
diff --git a/test/CodeGen/Thumb/segmented-stacks.ll b/test/CodeGen/Thumb/segmented-stacks.ll
new file mode 100644
index 0000000..ef7fd05
--- /dev/null
+++ b/test/CodeGen/Thumb/segmented-stacks.ll
@@ -0,0 +1,247 @@
+; RUN: llc < %s -mtriple=thumb-linux-androideabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=Thumb-android
+; RUN: llc < %s -mtriple=thumb-linux-unknown-gnueabi -segmented-stacks -verify-machineinstrs | FileCheck %s -check-prefix=Thumb-android
+; RUN: llc < %s -mtriple=thumb-linux-androideabi -segmented-stacks -filetype=obj
+; RUN: llc < %s -mtriple=thumb-linux-unknown-gnueabi -segmented-stacks -filetype=obj
+
+
+; Just to prevent the alloca from being optimized away
+declare void @dummy_use(i32*, i32)
+
+define void @test_basic() {
+        %mem = alloca i32, i32 10
+        call void @dummy_use (i32* %mem, i32 10)
+	ret void
+
+; Thumb-android:      test_basic:
+
+; Thumb-android:      push    {r4, r5}
+; Thumb-android-NEXT: mov     r5, sp
+; Thumb-android-NEXT: ldr     r4, .LCPI0_0
+; Thumb-android-NEXT: ldr     r4, [r4]
+; Thumb-android-NEXT: cmp     r4, r5
+; Thumb-android-NEXT: blo     .LBB0_2
+
+; Thumb-android:      mov     r4, #48
+; Thumb-android-NEXT: mov     r5, #0
+; Thumb-android-NEXT: push    {lr}
+; Thumb-android-NEXT: bl      __morestack
+; Thumb-android-NEXT: pop     {r4}
+; Thumb-android-NEXT: mov     lr, r4
+; Thumb-android-NEXT: pop     {r4, r5}
+; Thumb-android-NEXT: mov     pc, lr
+
+; Thumb-android:      pop     {r4, r5}
+
+; Thumb-linux:      test_basic:
+
+; Thumb-linux:      push    {r4, r5}
+; Thumb-linux-NEXT: mov     r5, sp
+; Thumb-linux-NEXT: ldr     r4, .LCPI0_0
+; Thumb-linux-NEXT: ldr     r4, [r4]
+; Thumb-linux-NEXT: cmp     r4, r5
+; Thumb-linux-NEXT: blo     .LBB0_2
+
+; Thumb-linux:      mov     r4, #44
+; Thumb-linux-NEXT: mov     r5, #0
+; Thumb-linux-NEXT: push    {lr}
+; Thumb-linux-NEXT: bl      __morestack
+; Thumb-linux-NEXT: pop     {r4}
+; Thumb-linux-NEXT: mov     lr, r4
+; Thumb-linux-NEXT: pop     {r4, r5}
+; Thumb-linux-NEXT: mov     pc, lr
+
+; Thumb-linux:      pop     {r4, r5}
+
+}
+
+define i32 @test_nested(i32 * nest %closure, i32 %other) {
+       %addend = load i32 * %closure
+       %result = add i32 %other, %addend
+       ret i32 %result
+
+; Thumb-android:      test_nested:
+
+; Thumb-android:      push    {r4, r5}
+; Thumb-android-NEXT: mov     r5, sp
+; Thumb-android-NEXT: ldr     r4, .LCPI1_0
+; Thumb-android-NEXT: ldr     r4, [r4]
+; Thumb-android-NEXT: cmp     r4, r5
+; Thumb-android-NEXT: blo     .LBB1_2
+
+; Thumb-android:      mov     r4, #0
+; Thumb-android-NEXT: mov     r5, #0
+; Thumb-android-NEXT: push    {lr}
+; Thumb-android-NEXT: bl      __morestack
+; Thumb-android-NEXT: pop     {r4}
+; Thumb-android-NEXT: mov     lr, r4
+; Thumb-android-NEXT: pop     {r4, r5}
+; Thumb-android-NEXT: mov     pc, lr
+
+; Thumb-android:      pop     {r4, r5}
+
+; Thumb-linux:      test_nested:
+
+; Thumb-linux:      push    {r4, r5}
+; Thumb-linux-NEXT: mov     r5, sp
+; Thumb-linux-NEXT: ldr     r4, .LCPI1_0
+; Thumb-linux-NEXT: ldr     r4, [r4]
+; Thumb-linux-NEXT: cmp     r4, r5
+; Thumb-linux-NEXT: blo     .LBB1_2
+
+; Thumb-linux:      mov     r4, #0
+; Thumb-linux-NEXT: mov     r5, #0
+; Thumb-linux-NEXT: push    {lr}
+; Thumb-linux-NEXT: bl      __morestack
+; Thumb-linux-NEXT: pop     {r4}
+; Thumb-linux-NEXT: mov     lr, r4
+; Thumb-linux-NEXT: pop     {r4, r5}
+; Thumb-linux-NEXT: mov     pc, lr
+
+; Thumb-linux:      pop     {r4, r5}
+
+}
+
+define void @test_large() {
+        %mem = alloca i32, i32 10000
+        call void @dummy_use (i32* %mem, i32 0)
+        ret void
+
+; Thumb-android:      test_large:
+
+; Thumb-android:      push    {r4, r5}
+; Thumb-android-NEXT: mov     r5, sp
+; Thumb-android-NEXT: sub     r5, #40192
+; Thumb-android-NEXT: ldr     r4, .LCPI2_2
+; Thumb-android-NEXT: ldr     r4, [r4]
+; Thumb-android-NEXT: cmp     r4, r5
+; Thumb-android-NEXT: blo     .LBB2_2
+
+; Thumb-android:      mov     r4, #40192
+; Thumb-android-NEXT: mov     r5, #0
+; Thumb-android-NEXT: push    {lr}
+; Thumb-android-NEXT: bl      __morestack
+; Thumb-android-NEXT: pop     {r4}
+; Thumb-android-NEXT: mov     lr, r4
+; Thumb-android-NEXT: pop     {r4, r5}
+; Thumb-android-NEXT: mov     pc, lr
+
+; Thumb-android:      pop     {r4, r5}
+
+; Thumb-linux:      test_large:
+
+; Thumb-linux:      push    {r4, r5}
+; Thumb-linux-NEXT: mov     r5, sp
+; Thumb-linux-NEXT: sub     r5, #40192
+; Thumb-linux-NEXT: ldr     r4, .LCPI2_2
+; Thumb-linux-NEXT: ldr     r4, [r4]
+; Thumb-linux-NEXT: cmp     r4, r5
+; Thumb-linux-NEXT: blo     .LBB2_2
+
+; Thumb-linux:      mov     r4, #40192
+; Thumb-linux-NEXT: mov     r5, #0
+; Thumb-linux-NEXT: push    {lr}
+; Thumb-linux-NEXT: bl      __morestack
+; Thumb-linux-NEXT: pop     {r4}
+; Thumb-linux-NEXT: mov     lr, r4
+; Thumb-linux-NEXT: pop     {r4, r5}
+; Thumb-linux-NEXT: mov     pc, lr
+
+; Thumb-linux:      pop     {r4, r5}
+
+}
+
+define fastcc void @test_fastcc() {
+        %mem = alloca i32, i32 10
+        call void @dummy_use (i32* %mem, i32 10)
+        ret void
+
+; Thumb-android:      test_fastcc:
+
+; Thumb-android:      push    {r4, r5}
+; Thumb-android-NEXT: mov     r5, sp
+; Thumb-android-NEXT: ldr     r4, .LCPI3_0
+; Thumb-android-NEXT: ldr     r4, [r4]
+; Thumb-android-NEXT: cmp     r4, r5
+; Thumb-android-NEXT: blo     .LBB3_2
+
+; Thumb-android:      mov     r4, #48
+; Thumb-android-NEXT: mov     r5, #0
+; Thumb-android-NEXT: push    {lr}
+; Thumb-android-NEXT: bl      __morestack
+; Thumb-android-NEXT: pop     {r4}
+; Thumb-android-NEXT: mov     lr, r4
+; Thumb-android-NEXT: pop     {r4, r5}
+; Thumb-android-NEXT: mov     pc, lr
+
+; Thumb-android:      pop     {r4, r5}
+
+; Thumb-linux:      test_fastcc:
+
+; Thumb-linux:      push    {r4, r5}
+; Thumb-linux-NEXT: mov     r5, sp
+; Thumb-linux-NEXT: ldr     r4, .LCPI3_0
+; Thumb-linux-NEXT: ldr     r4, [r4]
+; Thumb-linux-NEXT: cmp     r4, r5
+; Thumb-linux-NEXT: blo     .LBB3_2
+
+; Thumb-linux:      mov     r4, #44
+; Thumb-linux-NEXT: mov     r5, #0
+; Thumb-linux-NEXT: push    {lr}
+; Thumb-linux-NEXT: bl      __morestack
+; Thumb-linux-NEXT: pop     {r4}
+; Thumb-linux-NEXT: mov     lr, r4
+; Thumb-linux-NEXT: pop     {r4, r5}
+; Thumb-linux-NEXT: mov     pc, lr
+
+; Thumb-linux:      pop     {r4, r5}
+
+}
+
+define fastcc void @test_fastcc_large() {
+        %mem = alloca i32, i32 10000
+        call void @dummy_use (i32* %mem, i32 0)
+        ret void
+
+; Thumb-android:      test_fastcc_large:
+
+; Thumb-android:      push    {r4, r5}
+; Thumb-android-NEXT: mov     r5, sp
+; Thumb-android-NEXT: sub     r5, #40192
+; Thumb-android-NEXT: ldr     r4, .LCPI4_2
+; Thumb-android-NEXT: ldr     r4, [r4]
+; Thumb-android-NEXT: cmp     r4, r5
+; Thumb-android-NEXT: blo     .LBB4_2
+
+; Thumb-android:      mov     r4, #40192
+; Thumb-android-NEXT: mov     r5, #0
+; Thumb-android-NEXT: push    {lr}
+; Thumb-android-NEXT: bl      __morestack
+; Thumb-android-NEXT: pop     {r4}
+; Thumb-android-NEXT: mov     lr, r4
+; Thumb-android-NEXT: pop     {r4, r5}
+; Thumb-android-NEXT: mov     pc, lr
+
+; Thumb-android:      pop     {r4, r5}
+
+; Thumb-linux:      test_fastcc_large:
+
+; Thumb-linux:      push    {r4, r5}
+; Thumb-linux-NEXT: mov     r5, sp
+; Thumb-linux-NEXT: sub     r5, #40192
+; Thumb-linux-NEXT: ldr     r4, .LCPI4_2
+; Thumb-linux-NEXT: ldr     r4, [r4]
+; Thumb-linux-NEXT: cmp     r4, r5
+; Thumb-linux-NEXT: blo     .LBB4_2
+
+; Thumb-linux:      mov     r4, #40192
+; Thumb-linux-NEXT: mov     r5, #0
+; Thumb-linux-NEXT: push    {lr}
+; Thumb-linux-NEXT: bl      __morestack
+; Thumb-linux-NEXT: pop     {r4}
+; Thumb-linux-NEXT: mov     lr, r4
+; Thumb-linux-NEXT: pop     {r4, r5}
+; Thumb-linux-NEXT: mov     pc, lr
+
+; Thumb-linux:      pop     {r4, r5}
+
+}


More information about the llvm-commits mailing list