[llvm-commits] [llvm] r158087 - in /llvm/trunk: lib/Target/X86/X86FrameLowering.cpp lib/Target/X86/X86RegisterInfo.cpp lib/Target/X86/X86RegisterInfo.h test/CodeGen/X86/alloca-align-rounding-32.ll test/CodeGen/X86/alloca-align-rounding.ll test/Co

Fri Jun 15 17:46:31 PDT 2012

Great, thanks for the update.

On Fri, Jun 15, 2012 at 5:41 PM, Chad Rosier <mcrosier at apple.com> wrote:
> Hi Matt,
> I just wanted to let you know I'm working on a fix for this.  There's actually two issues going on here: (1) the issue you've pointed out and (2) the -mstackrealignment is forcing realignment when it's not necessary (i.e., the ABI alignment is >= then the forced alignment).  I haven't figured out a fix for (1), but I have a solution for (2) and fixing that will solve your problem.  Hopefully, I'll have this done in the next few days.  If you need a fix before then feel free to revert this commit.
>
>  Chad
>
> On Jun 14, 2012, at 2:27 PM, Matt Beaumont-Gay wrote:
>
>> Attached input is probably not quite minimal, but close enough. On Linux x86_64:
>> clang -mstackrealign -O1 -o /tmp/repro repro.cc && /tmp/repro
>> 776
>> 367
>> zsh: segmentation fault  /tmp/repro
>>
>> The problematic bits look like this:
>> 00000000004005f0 <_Z1gic>:
>>  4005f0:       55                      push   %rbp
>>  4005f1:       48 89 e5                mov    %rsp,%rbp
>>  4005f4:       48 81 e4 f0 ff ff ff    and    $0xfffffffffffffff0,%rsp
>>  4005fb:       41 57                   push   %r15
>>  4005fd:       41 56                   push   %r14
>>  4005ff:       41 55                   push   %r13
>>  400601:       41 54                   push   %r12
>>  400603:       53                      push   %rbx
>>  400604:       48 83 ec 18             sub    $0x18,%rsp  <=== So
>> far, so normal.
>>  400608:       48 89 e3                mov    %rsp,%rbx
>>  40060b:       41 89 f4                mov    %esi,%r12d
>>  40060e:       4c 63 ff                movslq %edi,%r15
>>  400611:       41 83 ff 63             cmp    $0x63,%r15d
>>  400615:       7f 13                   jg     40062a <_Z1gic+0x3a>
>>  400617:       49 8d 47 0f             lea    0xf(%r15),%rax
>>  40061b:       48 83 e0 f0             and    $0xfffffffffffffff0,%rax
>>  40061f:       49 89 e6                mov    %rsp,%r14
>>  400622:       49 29 c6                sub    %rax,%r14
>>  400625:       4c 89 f4                mov    %r14,%rsp  <=== OK, as
>> long as we clean up properly later...
>> <more function body not touching %rsp>
>>  4006d1:       48 83 c4 18             add    $0x18,%rsp  <=== Oops.
>>  4006d5:       5b                      pop    %rbx
>>  4006d6:       41 5c                   pop    %r12
>>  4006d8:       41 5d                   pop    %r13
>>  4006da:       41 5e                   pop    %r14
>>  4006dc:       41 5f                   pop    %r15
>>  4006de:       48 89 ec                mov    %rbp,%rsp
>>  4006e1:       5d                      pop    %rbp
>>  4006e2:       c3                      retq
>>
>> On Thu, Jun 14, 2012 at 12:15 PM, Chad Rosier <mcrosier at apple.com> wrote:
>>> Hi Matt,
>>> I'd be happy to investigate given a test case.  I'm at WWDC today, but should be able to take a look tomorrow.
>>>
>>>  Chad
>>>
>>>
>>>
>>> On Jun 14, 2012, at 10:51 AM, Matt Beaumont-Gay <matthewbg at google.com> wrote:
>>>
>>>> Hi Chad,
>>>>
>>>> This is causing some breakage. In functions with stack realignment and
>>>> dynamic allocas (and possibly some other conditions that I don't yet
>>>> fully understand), we generate an epilog that adds a constant to %rsp
>>>> rather than recalculating it relative to %rbp before popping
>>>> callee-save registers. I don't have a small test case yet, but I
>>>> wanted to give you a heads up.
>>>>
>>>> -Matt
>>>>
>>>> On Wed, Jun 6, 2012 at 10:37 AM, Chad Rosier <mcrosier at apple.com> wrote:
>>>>> Author: mcrosier
>>>>> Date: Wed Jun  6 12:37:40 2012
>>>>> New Revision: 158087
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=158087&view=rev
>>>>> Log:
>>>>> Add support for dynamic stack realignment in the presence of dynamic allocas on
>>>>> X86.
>>>>> rdar://11496434
>>>>>
>>>>> Added:
>>>>>    llvm/trunk/test/CodeGen/X86/dynamic-allocas-VLAs.ll
>>>>> Modified:
>>>>>    llvm/trunk/lib/Target/X86/X86FrameLowering.cpp
>>>>>    llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp
>>>>>    llvm/trunk/lib/Target/X86/X86RegisterInfo.h
>>>>>    llvm/trunk/test/CodeGen/X86/alloca-align-rounding-32.ll
>>>>>    llvm/trunk/test/CodeGen/X86/alloca-align-rounding.ll
>>>>>
>>>>> Modified: llvm/trunk/lib/Target/X86/X86FrameLowering.cpp
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86FrameLowering.cpp?rev=158087&r1=158086&r2=158087&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/Target/X86/X86FrameLowering.cpp (original)
>>>>> +++ llvm/trunk/lib/Target/X86/X86FrameLowering.cpp Wed Jun  6 12:37:40 2012
>>>>> @@ -650,6 +650,7 @@
>>>>>   unsigned SlotSize = RegInfo->getSlotSize();
>>>>>   unsigned FramePtr = RegInfo->getFrameRegister(MF);
>>>>>   unsigned StackPtr = RegInfo->getStackRegister();
>>>>> +  unsigned BasePtr = RegInfo->getBaseRegister();
>>>>>   DebugLoc DL;
>>>>>
>>>>>   // If we're forcing a stack realignment we can't rely on just the frame
>>>>> @@ -913,6 +914,18 @@
>>>>>     emitSPUpdate(MBB, MBBI, StackPtr, -(int64_t)NumBytes, Is64Bit,
>>>>>                  UseLEA, TII, *RegInfo);
>>>>>
>>>>> +  // If we need a base pointer, set it up here. It's whatever the value
>>>>> +  // of the stack pointer is at this point. Any variable size objects
>>>>> +  // will be allocated after this, so we can still use the base pointer
>>>>> +  // to reference locals.
>>>>> +  if (RegInfo->hasBasePointer(MF)) {
>>>>> +    // Update the frame pointer with the current stack pointer.
>>>>> +    unsigned Opc = Is64Bit ? X86::MOV64rr : X86::MOV32rr;
>>>>> +    BuildMI(MBB, MBBI, DL, TII.get(Opc), BasePtr)
>>>>> +      .addReg(StackPtr)
>>>>> +      .setMIFlag(MachineInstr::FrameSetup);
>>>>> +  }
>>>>> +
>>>>>   if (( (!HasFP && NumBytes) || PushedRegs) && needsFrameMoves) {
>>>>>     // Mark end of stack pointer adjustment.
>>>>>     MCSymbol *Label = MMI.getContext().CreateTempSymbol();
>>>>> @@ -1148,7 +1161,16 @@
>>>>>   int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();
>>>>>   uint64_t StackSize = MFI->getStackSize();
>>>>>
>>>>> -  if (RegInfo->needsStackRealignment(MF)) {
>>>>> +  if (RegInfo->hasBasePointer(MF)) {
>>>>> +    assert (hasFP(MF) && "VLAs and dynamic stack realign, but no FP?!");
>>>>> +    if (FI < 0) {
>>>>> +      // Skip the saved EBP.
>>>>> +      return Offset + RegInfo->getSlotSize();
>>>>> +    } else {
>>>>> +      assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);
>>>>> +      return Offset + StackSize;
>>>>> +    }
>>>>> +  } else if (RegInfo->needsStackRealignment(MF)) {
>>>>>     if (FI < 0) {
>>>>>       // Skip the saved EBP.
>>>>>       return Offset + RegInfo->getSlotSize();
>>>>> @@ -1179,9 +1201,14 @@
>>>>>   const X86RegisterInfo *RegInfo =
>>>>>       static_cast<const X86RegisterInfo*>(MF.getTarget().getRegisterInfo());
>>>>>   // We can't calculate offset from frame pointer if the stack is realigned,
>>>>> -  // so enforce usage of stack pointer.
>>>>> -  FrameReg = (RegInfo->needsStackRealignment(MF)) ?
>>>>> -    RegInfo->getStackRegister() : RegInfo->getFrameRegister(MF);
>>>>> +  // so enforce usage of stack/base pointer.  The base pointer is used when we
>>>>> +  // have dynamic allocas in addition to dynamic realignment.
>>>>> +  if (RegInfo->hasBasePointer(MF))
>>>>> +    FrameReg = RegInfo->getBaseRegister();
>>>>> +  else if (RegInfo->needsStackRealignment(MF))
>>>>> +    FrameReg = RegInfo->getStackRegister();
>>>>> +  else
>>>>> +    FrameReg = RegInfo->getFrameRegister(MF);
>>>>>   return getFrameIndexOffset(MF, FI);
>>>>>  }
>>>>>
>>>>> @@ -1318,6 +1345,10 @@
>>>>>            "Slot for EBP register must be last in order to be found!");
>>>>>     (void)FrameIdx;
>>>>>   }
>>>>> +
>>>>> +  // Spill the BasePtr if it's used.
>>>>> +  if (RegInfo->hasBasePointer(MF))
>>>>> +    MF.getRegInfo().setPhysRegUsed(RegInfo->getBaseRegister());
>>>>>  }
>>>>>
>>>>>  static bool
>>>>>
>>>>> Modified: llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp?rev=158087&r1=158086&r2=158087&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp (original)
>>>>> +++ llvm/trunk/lib/Target/X86/X86RegisterInfo.cpp Wed Jun  6 12:37:40 2012
>>>>> @@ -50,6 +50,10 @@
>>>>>                            " needed for the function."),
>>>>>                  cl::init(false), cl::Hidden);
>>>>>
>>>>> +cl::opt<bool>
>>>>> +EnableBasePointer("x86-use-base-pointer", cl::Hidden, cl::init(true),
>>>>> +          cl::desc("Enable use of a base pointer for complex stack frames"));
>>>>> +
>>>>>  X86RegisterInfo::X86RegisterInfo(X86TargetMachine &tm,
>>>>>                                  const TargetInstrInfo &tii)
>>>>>   : X86GenRegisterInfo(tm.getSubtarget<X86Subtarget>().is64Bit()
>>>>> @@ -68,10 +72,12 @@
>>>>>     SlotSize = 8;
>>>>>     StackPtr = X86::RSP;
>>>>>     FramePtr = X86::RBP;
>>>>> +    BasePtr = X86::RBX;
>>>>>   } else {
>>>>>     SlotSize = 4;
>>>>>     StackPtr = X86::ESP;
>>>>>     FramePtr = X86::EBP;
>>>>> +    BasePtr = X86::EBX;
>>>>>   }
>>>>>  }
>>>>>
>>>>> @@ -290,6 +296,20 @@
>>>>>       Reserved.set(*I);
>>>>>   }
>>>>>
>>>>> +  // Set the base-pointer register and its aliases as reserved if needed.
>>>>> +  if (hasBasePointer(MF)) {
>>>>> +    CallingConv::ID CC = MF.getFunction()->getCallingConv();
>>>>> +    const uint32_t* RegMask = getCallPreservedMask(CC);
>>>>> +    if (MachineOperand::clobbersPhysReg(RegMask, getBaseRegister()))
>>>>> +      report_fatal_error(
>>>>> +        "Stack realignment in presence of dynamic allocas is not supported with"
>>>>> +        "this calling convention.");
>>>>> +
>>>>> +    Reserved.set(getBaseRegister());
>>>>> +    for (MCSubRegIterator I(getBaseRegister(), this); I.isValid(); ++I)
>>>>> +      Reserved.set(*I);
>>>>> +  }
>>>>> +
>>>>>   // Mark the segment registers as reserved.
>>>>>   Reserved.set(X86::CS);
>>>>>   Reserved.set(X86::SS);
>>>>> @@ -340,10 +360,35 @@
>>>>>  // Stack Frame Processing methods
>>>>>  //===----------------------------------------------------------------------===//
>>>>>
>>>>> +bool X86RegisterInfo::hasBasePointer(const MachineFunction &MF) const {
>>>>> +   const MachineFrameInfo *MFI = MF.getFrameInfo();
>>>>> +
>>>>> +   if (!EnableBasePointer)
>>>>> +     return false;
>>>>> +
>>>>> +   // When we need stack realignment and there are dynamic allocas, we can't
>>>>> +   // reference off of the stack pointer, so we reserve a base pointer.
>>>>> +   if (needsStackRealignment(MF) && MFI->hasVarSizedObjects())
>>>>> +     return true;
>>>>> +
>>>>> +   return false;
>>>>> +}
>>>>> +
>>>>>  bool X86RegisterInfo::canRealignStack(const MachineFunction &MF) const {
>>>>>   const MachineFrameInfo *MFI = MF.getFrameInfo();
>>>>> -  return (MF.getTarget().Options.RealignStack &&
>>>>> -          !MFI->hasVarSizedObjects());
>>>>> +  const MachineRegisterInfo *MRI = &MF.getRegInfo();
>>>>> +  if (!MF.getTarget().Options.RealignStack)
>>>>> +    return false;
>>>>> +
>>>>> +  // Stack realignment requires a frame pointer.  If we already started
>>>>> +  // register allocation with frame pointer elimination, it is too late now.
>>>>> +  if (!MRI->canReserveReg(FramePtr))
>>>>> +    return false;
>>>>> +
>>>>> +  // If base pointer is necessary.  Check that it isn't too late to reserve it.
>>>>> +  if (MFI->hasVarSizedObjects())
>>>>> +    return MRI->canReserveReg(BasePtr);
>>>>> +  return true;
>>>>>  }
>>>>>
>>>>>  bool X86RegisterInfo::needsStackRealignment(const MachineFunction &MF) const {
>>>>> @@ -353,13 +398,6 @@
>>>>>   bool requiresRealignment = ((MFI->getMaxAlignment() > StackAlign) ||
>>>>>                                F->hasFnAttr(Attribute::StackAlignment));
>>>>>
>>>>> -  // FIXME: Currently we don't support stack realignment for functions with
>>>>> -  //        variable-sized allocas.
>>>>> -  // FIXME: It's more complicated than this...
>>>>> -  if (0 && requiresRealignment && MFI->hasVarSizedObjects())
>>>>> -    report_fatal_error(
>>>>> -      "Stack realignment in presence of dynamic allocas is not supported");
>>>>> -
>>>>>   // If we've requested that we force align the stack do so now.
>>>>>   if (ForceStackAlign)
>>>>>     return canRealignStack(MF);
>>>>> @@ -499,7 +537,9 @@
>>>>>
>>>>>   unsigned Opc = MI.getOpcode();
>>>>>   bool AfterFPPop = Opc == X86::TAILJMPm64 || Opc == X86::TAILJMPm;
>>>>> -  if (needsStackRealignment(MF))
>>>>> +  if (hasBasePointer(MF))
>>>>> +    BasePtr = getBaseRegister();
>>>>> +  else if (needsStackRealignment(MF))
>>>>>     BasePtr = (FrameIndex < 0 ? FramePtr : StackPtr);
>>>>>   else if (AfterFPPop)
>>>>>     BasePtr = StackPtr;
>>>>>
>>>>> Modified: llvm/trunk/lib/Target/X86/X86RegisterInfo.h
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86RegisterInfo.h?rev=158087&r1=158086&r2=158087&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/lib/Target/X86/X86RegisterInfo.h (original)
>>>>> +++ llvm/trunk/lib/Target/X86/X86RegisterInfo.h Wed Jun  6 12:37:40 2012
>>>>> @@ -50,6 +50,11 @@
>>>>>   ///
>>>>>   unsigned FramePtr;
>>>>>
>>>>> +  /// BasePtr - X86 physical register used as a base ptr in complex stack
>>>>> +  /// frames. I.e., when we need a 3rd base, not just SP and FP, due to
>>>>> +  /// variable size stack objects.
>>>>> +  unsigned BasePtr;
>>>>> +
>>>>>  public:
>>>>>   X86RegisterInfo(X86TargetMachine &tm, const TargetInstrInfo &tii);
>>>>>
>>>>> @@ -106,6 +111,8 @@
>>>>>   /// register scavenger to determine what registers are free.
>>>>>   BitVector getReservedRegs(const MachineFunction &MF) const;
>>>>>
>>>>> +  bool hasBasePointer(const MachineFunction &MF) const;
>>>>> +
>>>>>   bool canRealignStack(const MachineFunction &MF) const;
>>>>>
>>>>>   bool needsStackRealignment(const MachineFunction &MF) const;
>>>>> @@ -123,6 +130,7 @@
>>>>>   // Debug information queries.
>>>>>   unsigned getFrameRegister(const MachineFunction &MF) const;
>>>>>   unsigned getStackRegister() const { return StackPtr; }
>>>>> +  unsigned getBaseRegister() const { return BasePtr; }
>>>>>   // FIXME: Move to FrameInfok
>>>>>   unsigned getSlotSize() const { return SlotSize; }
>>>>>
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/X86/alloca-align-rounding-32.ll
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/alloca-align-rounding-32.ll?rev=158087&r1=158086&r2=158087&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/X86/alloca-align-rounding-32.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/X86/alloca-align-rounding-32.ll Wed Jun  6 12:37:40 2012
>>>>> @@ -1,4 +1,4 @@
>>>>> -; RUN: llc < %s -march=x86 -mtriple=i686-apple-darwin | grep and | count 1
>>>>> +; RUN: llc < %s -march=x86 -mtriple=i686-apple-darwin | FileCheck %s
>>>>>
>>>>>  declare void @bar(<2 x i64>* %n)
>>>>>
>>>>> @@ -6,10 +6,15 @@
>>>>>   %p = alloca <2 x i64>, i32 %h
>>>>>   call void @bar(<2 x i64>* %p)
>>>>>   ret void
>>>>> +; CHECK: foo
>>>>> +; CHECK-NOT: andl $-32, %eax
>>>>>  }
>>>>>
>>>>>  define void @foo2(i32 %h) {
>>>>>   %p = alloca <2 x i64>, i32 %h, align 32
>>>>>   call void @bar(<2 x i64>* %p)
>>>>>   ret void
>>>>> +; CHECK: foo2
>>>>> +; CHECK: andl $-32, %esp
>>>>> +; CHECK: andl $-32, %eax
>>>>>  }
>>>>>
>>>>> Modified: llvm/trunk/test/CodeGen/X86/alloca-align-rounding.ll
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/alloca-align-rounding.ll?rev=158087&r1=158086&r2=158087&view=diff
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/X86/alloca-align-rounding.ll (original)
>>>>> +++ llvm/trunk/test/CodeGen/X86/alloca-align-rounding.ll Wed Jun  6 12:37:40 2012
>>>>> @@ -1,4 +1,4 @@
>>>>> -; RUN: llc < %s -march=x86-64 -mtriple=i686-pc-linux | grep and | count 1
>>>>> +; RUN: llc < %s -march=x86-64 -mtriple=i686-pc-linux | FileCheck %s
>>>>>
>>>>>  declare void @bar(<2 x i64>* %n)
>>>>>
>>>>> @@ -6,10 +6,15 @@
>>>>>   %p = alloca <2 x i64>, i64 %h
>>>>>   call void @bar(<2 x i64>* %p)
>>>>>   ret void
>>>>> +; CHECK: foo
>>>>> +; CHECK-NOT: andq $-32, %rax
>>>>>  }
>>>>>
>>>>>  define void @foo2(i64 %h) {
>>>>>   %p = alloca <2 x i64>, i64 %h, align 32
>>>>>   call void @bar(<2 x i64>* %p)
>>>>>   ret void
>>>>> +; CHECK: foo2
>>>>> +; CHECK: andq $-32, %rsp
>>>>> +; CHECK: andq $-32, %rax
>>>>>  }
>>>>>
>>>>> Added: llvm/trunk/test/CodeGen/X86/dynamic-allocas-VLAs.ll
>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/dynamic-allocas-VLAs.ll?rev=158087&view=auto
>>>>> ==============================================================================
>>>>> --- llvm/trunk/test/CodeGen/X86/dynamic-allocas-VLAs.ll (added)
>>>>> +++ llvm/trunk/test/CodeGen/X86/dynamic-allocas-VLAs.ll Wed Jun  6 12:37:40 2012
>>>>> @@ -0,0 +1,158 @@
>>>>> +; RUN: llc < %s -march=x86-64 -mattr=+avx -mtriple=i686-apple-darwin10 | FileCheck %s
>>>>> +; rdar://11496434
>>>>> +
>>>>> +; no VLAs or dynamic alignment
>>>>> +define i32 @t1() nounwind uwtable ssp {
>>>>> +entry:
>>>>> +  %a = alloca i32, align 4
>>>>> +  call void @t1_helper(i32* %a) nounwind
>>>>> +  %0 = load i32* %a, align 4
>>>>> +  %add = add nsw i32 %0, 13
>>>>> +  ret i32 %add
>>>>> +
>>>>> +; CHECK: _t1
>>>>> +; CHECK-NOT: andq $-{{[0-9]+}}, %rsp
>>>>> +; CHECK: leaq [[OFFSET:[0-9]*]](%rsp), %rdi
>>>>> +; CHECK: callq _t1_helper
>>>>> +; CHECK: movl [[OFFSET]](%rsp), %eax
>>>>> +; CHECK: addl $13, %eax
>>>>> +}
>>>>> +
>>>>> +declare void @t1_helper(i32*)
>>>>> +
>>>>> +; dynamic realignment
>>>>> +define i32 @t2() nounwind uwtable ssp {
>>>>> +entry:
>>>>> +  %a = alloca i32, align 4
>>>>> +  %v = alloca <8 x float>, align 32
>>>>> +  call void @t2_helper(i32* %a, <8 x float>* %v) nounwind
>>>>> +  %0 = load i32* %a, align 4
>>>>> +  %add = add nsw i32 %0, 13
>>>>> +  ret i32 %add
>>>>> +
>>>>> +; CHECK: _t2
>>>>> +; CHECK: pushq %rbp
>>>>> +; CHECK: movq %rsp, %rbp
>>>>> +; CHECK: andq $-32, %rsp
>>>>> +; CHECK: subq ${{[0-9]+}}, %rsp
>>>>> +;
>>>>> +; CHECK: leaq {{[0-9]*}}(%rsp), %rdi
>>>>> +; CHECK: leaq {{[0-9]*}}(%rsp), %rsi
>>>>> +; CHECK: callq _t2_helper
>>>>> +;
>>>>> +; CHECK: movq %rbp, %rsp
>>>>> +; CHECK: popq %rbp
>>>>> +}
>>>>> +
>>>>> +declare void @t2_helper(i32*, <8 x float>*)
>>>>> +
>>>>> +; VLAs
>>>>> +define i32 @t3(i64 %sz) nounwind uwtable ssp {
>>>>> +entry:
>>>>> +  %a = alloca i32, align 4
>>>>> +  %vla = alloca i32, i64 %sz, align 16
>>>>> +  call void @t3_helper(i32* %a, i32* %vla) nounwind
>>>>> +  %0 = load i32* %a, align 4
>>>>> +  %add = add nsw i32 %0, 13
>>>>> +  ret i32 %add
>>>>> +
>>>>> +; CHECK: _t3
>>>>> +; CHECK: pushq %rbp
>>>>> +; CHECK: movq %rsp, %rbp
>>>>> +; CHECK: pushq %rbx
>>>>> +; CHECK-NOT: andq $-{{[0-9]+}}, %rsp
>>>>> +; CHECK: subq ${{[0-9]+}}, %rsp
>>>>> +;
>>>>> +; CHECK: leaq -{{[0-9]+}}(%rbp), %rsp
>>>>> +; CHECK: popq %rbx
>>>>> +; CHECK: popq %rbp
>>>>> +}
>>>>> +
>>>>> +declare void @t3_helper(i32*, i32*)
>>>>> +
>>>>> +; VLAs + Dynamic realignment
>>>>> +define i32 @t4(i64 %sz) nounwind uwtable ssp {
>>>>> +entry:
>>>>> +  %a = alloca i32, align 4
>>>>> +  %v = alloca <8 x float>, align 32
>>>>> +  %vla = alloca i32, i64 %sz, align 16
>>>>> +  call void @t4_helper(i32* %a, i32* %vla, <8 x float>* %v) nounwind
>>>>> +  %0 = load i32* %a, align 4
>>>>> +  %add = add nsw i32 %0, 13
>>>>> +  ret i32 %add
>>>>> +
>>>>> +; CHECK: _t4
>>>>> +; CHECK: pushq %rbp
>>>>> +; CHECK: movq %rsp, %rbp
>>>>> +; CHECK: andq $-32, %rsp
>>>>> +; CHECK: pushq %r14
>>>>> +; CHECK: pushq %rbx
>>>>> +; CHECK: subq $[[STACKADJ:[0-9]+]], %rsp
>>>>> +; CHECK: movq %rsp, %rbx
>>>>> +;
>>>>> +; CHECK: leaq {{[0-9]*}}(%rbx), %rdi
>>>>> +; CHECK: leaq {{[0-9]*}}(%rbx), %rdx
>>>>> +; CHECK: callq   _t4_helper
>>>>> +;
>>>>> +; CHECK: addq $[[STACKADJ]], %rsp
>>>>> +; CHECK: popq %rbx
>>>>> +; CHECK: popq %r14
>>>>> +; CHECK: movq %rbp, %rsp
>>>>> +; CHECK: popq %rbp
>>>>> +}
>>>>> +
>>>>> +declare void @t4_helper(i32*, i32*, <8 x float>*)
>>>>> +
>>>>> +; Dynamic realignment + Spill
>>>>> +define i32 @t5(float* nocapture %f) nounwind uwtable ssp {
>>>>> +entry:
>>>>> +  %a = alloca i32, align 4
>>>>> +  %0 = bitcast float* %f to <8 x float>*
>>>>> +  %1 = load <8 x float>* %0, align 32
>>>>> +  call void @t5_helper1(i32* %a) nounwind
>>>>> +  call void @t5_helper2(<8 x float> %1) nounwind
>>>>> +  %2 = load i32* %a, align 4
>>>>> +  %add = add nsw i32 %2, 13
>>>>> +  ret i32 %add
>>>>> +
>>>>> +; CHECK: _t5
>>>>> +; CHECK: pushq %rbp
>>>>> +; CHECK: movq %rsp, %rbp
>>>>> +; CHECK: andq $-32, %rsp
>>>>> +; CHECK: subq ${{[0-9]+}}, %rsp
>>>>> +;
>>>>> +; CHECK: vmovaps (%rdi), [[AVXREG:%ymm[0-9]+]]
>>>>> +; CHECK: vmovaps [[AVXREG]], (%rsp)
>>>>> +; CHECK: leaq {{[0-9]+}}(%rsp), %rdi
>>>>> +; CHECK: callq   _t5_helper1
>>>>> +; CHECK: vmovaps (%rsp), %ymm0
>>>>> +; CHECK: callq   _t5_helper2
>>>>> +; CHECK: movl {{[0-9]+}}(%rsp), %eax
>>>>> +;
>>>>> +; CHECK: movq %rbp, %rsp
>>>>> +; CHECK: popq %rbp
>>>>> +}
>>>>> +
>>>>> +declare void @t5_helper1(i32*)
>>>>> +
>>>>> +declare void @t5_helper2(<8 x float>)
>>>>> +
>>>>> +; VLAs + Dynamic realignment + Spill
>>>>> +; FIXME: RA has already reserved RBX, so we can't do dynamic realignment.
>>>>> +define i32 @t6(i64 %sz, float* nocapture %f) nounwind uwtable ssp {
>>>>> +entry:
>>>>> +; CHECK: _t6
>>>>> +  %a = alloca i32, align 4
>>>>> +  %0 = bitcast float* %f to <8 x float>*
>>>>> +  %1 = load <8 x float>* %0, align 32
>>>>> +  %vla = alloca i32, i64 %sz, align 16
>>>>> +  call void @t6_helper1(i32* %a, i32* %vla) nounwind
>>>>> +  call void @t6_helper2(<8 x float> %1) nounwind
>>>>> +  %2 = load i32* %a, align 4
>>>>> +  %add = add nsw i32 %2, 13
>>>>> +  ret i32 %add
>>>>> +}
>>>>> +
>>>>> +declare void @t6_helper1(i32*, i32*)
>>>>> +
>>>>> +declare void @t6_helper2(<8 x float>)
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> <repro.cc>
>