[llvm] r221320 - ARM/Dwarf: correctly align stack before callee-saved VPRs

Oliver Stannard oliver.stannard at arm.com
Fri Nov 14 08:24:42 PST 2014


Hi Tim,

I think this patch, or the related one, is causing miscompilations. These
manifest as either an argument being loaded from the wrong location on the
stack, or the final pop being from the wrong address, resulting in the PC
being restored incorrectly.

The following code demonstrates this:

  void failed_i(int actual) {
    printf("failed_i %d\n", actual); 
  }
  
  double blah(int a, int b, int c, int d, double e) {
    if (1 != a)
      failed_i(a);
    asm("nop"
        :
        :
        : "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10",
          "r11", "r12", "lr");
    return e;
  }

$ clang --target=armv7a--none-eabi -mcpu=cortex-a9 -c test.c -o test.o -Oz
-fno-inline

    blah
        0x00008190:    e92d4ff8    .O-.    PUSH     {r3-r11,lr}
        0x00008194:    e28db01c    ....    ADD      r11,sp,#0x1c
        0x00008198:    ed2d7b04    .{-.    VPUSH    {d7-d8}
        0x0000819c:    ed9b8b02    ....    VLDR     d8,[r11,#8]
        0x000081a0:    e3500001    ..P.    CMP      r0,#1
        0x000081a4:    0a000000    ....    BEQ      0x81ac ; blah + 28
        0x000081a8:    ebfffff1    ....    BL       failed_i ; 0x8174
        0x000081ac:    ec501b18    ..P.    VMOV     r1,r0,d8
        0x000081b0:    e58d0004    ....    STR      r0,[sp,#4]
        0x000081b4:    e58d1000    ....    STR      r1,[sp,#0]
        0x000081b8:    e320f000    .. .    NOP      
        0x000081bc:    e89d0003    ....    LDM      sp,{r0,r1}
        0x000081c0:    e24bd028    (.K.    SUB      sp,r11,#0x28
        0x000081c4:    ecbd8b02    ....    VPOP     {d8}
        0x000081c8:    e28dd004    ....    ADD      sp,sp,#4
        0x000081cc:    e8bd8ff0    ....    POP      {r4-r11,pc}

In this case, the stack pointer is reset 4 bytes too low by the SUB at
0x000081c0, resulting in the wrong value being loaded into the PC. If I
reduce the number of registers clobbered by the inline asm, the function
returns correctly, but the return value is incorrect.

Oliver

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Tim Northover
> Sent: 05 November 2014 00:27
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm] r221320 - ARM/Dwarf: correctly align stack before
> callee-saved VPRs
> 
> Author: tnorthover
> Date: Tue Nov  4 18:27:13 2014
> New Revision: 221320
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=221320&view=rev
> Log:
> ARM/Dwarf: correctly align stack before callee-saved VPRs
> 
> We were making an attempt to do this by adding an extra callee-saved
> GPR (so
> that there was an even number in the list), but when that failed we
> went ahead
> and pushed anyway.
> 
> This had a couple of potential issues:
>   + The .cfi directives we emit misplaced dN because they were based on
>     PrologEpilogInserter's calculation.
>   + Unaligned stores can be less efficient.
>   + Unaligned stores can actually fault (likely only an issue in niche
> cases,
>     but possible).
> 
> This adds a final explicit stack adjustment if all other options fail,
> so that
> the actual locations of the registers match up with where they should
> be.
> 
> Added:
>     llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll
> Modified:
>     llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp
>     llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h
> 
> Modified: llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp?rev=221320&r1=22
> 1319&r2=221320&view=diff
> =======================================================================
> =======
> --- llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp (original)
> +++ llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp Tue Nov  4 18:27:13
> 2014
> @@ -260,10 +260,11 @@ void ARMFrameLowering::emitPrologue(Mach
> 
>    // Determine starting offsets of spill areas.
>    bool HasFP = hasFP(MF);
> -  unsigned DPRCSOffset  = NumBytes - (ArgRegsSaveSize + GPRCS1Size
> -                                      + GPRCS2Size + DPRCSSize);
> -  unsigned GPRCS2Offset = DPRCSOffset + DPRCSSize;
> -  unsigned GPRCS1Offset = GPRCS2Offset + GPRCS2Size;
> +  unsigned GPRCS1Offset = NumBytes - ArgRegsSaveSize - GPRCS1Size;
> +  unsigned GPRCS2Offset = GPRCS1Offset - GPRCS2Size;
> +  unsigned DPRAlign = DPRCSSize ? std::min(8U, Align) : 4U;
> +  unsigned DPRGapSize = (GPRCS1Size + GPRCS2Size + ArgRegsSaveSize) %
> DPRAlign;
> +  unsigned DPRCSOffset = GPRCS2Offset - DPRGapSize - DPRCSSize;
>    int FramePtrOffsetInPush = 0;
>    if (HasFP) {
>      FramePtrOffsetInPush = MFI->getObjectOffset(FramePtrSpillFI)
> @@ -279,6 +280,15 @@ void ARMFrameLowering::emitPrologue(Mach
>    if (GPRCS2Size > 0)
>      GPRCS2Push = LastPush = MBBI++;
> 
> +  // Prolog/epilog inserter assumes we correctly align DPRs on the
> stack, so our
> +  // .cfi_offset operations will reflect that.
> +  if (DPRGapSize) {
> +    assert(DPRGapSize == 4 && "unexpected alignment requirements for
> DPRs");
> +    if (!tryFoldSPUpdateIntoPushPop(STI, MF, LastPush, DPRGapSize))
> +      emitSPUpdate(isARM, MBB, MBBI, dl, TII, -DPRGapSize,
> +                   MachineInstr::FrameSetup);
> +  }
> +
>    // Move past area 3.
>    if (DPRCSSize > 0) {
>      DPRCSPush = MBBI;
> @@ -508,6 +518,7 @@ void ARMFrameLowering::emitPrologue(Mach
> 
>    AFI->setGPRCalleeSavedArea1Size(GPRCS1Size);
>    AFI->setGPRCalleeSavedArea2Size(GPRCS2Size);
> +  AFI->setDPRCalleeSavedGapSize(DPRGapSize);
>    AFI->setDPRCalleeSavedAreaSize(DPRCSSize);
> 
>    // If we need dynamic stack realignment, do it here. Be paranoid and
> make
> @@ -613,6 +624,7 @@ void ARMFrameLowering::emitEpilogue(Mach
>      NumBytes -= (ArgRegsSaveSize +
>                   AFI->getGPRCalleeSavedArea1Size() +
>                   AFI->getGPRCalleeSavedArea2Size() +
> +                 AFI->getDPRCalleeSavedGapSize() +
>                   AFI->getDPRCalleeSavedAreaSize());
> 
>      // Reset SP based on frame pointer only if the stack frame extends
> beyond
> @@ -661,6 +673,12 @@ void ARMFrameLowering::emitEpilogue(Mach
>        while (MBBI->getOpcode() == ARM::VLDMDIA_UPD)
>          MBBI++;
>      }
> +    if (AFI->getDPRCalleeSavedGapSize()) {
> +      assert(AFI->getDPRCalleeSavedGapSize() == 4 &&
> +             "unexpected DPR alignment gap");
> +      emitSPUpdate(isARM, MBB, MBBI, dl, TII, AFI-
> >getDPRCalleeSavedGapSize());
> +    }
> +
>      if (AFI->getGPRCalleeSavedArea2Size()) MBBI++;
>      if (AFI->getGPRCalleeSavedArea1Size()) MBBI++;
>    }
> 
> Modified: llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h?rev=221320&r
> 1=221319&r2=221320&view=diff
> =======================================================================
> =======
> --- llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h (original)
> +++ llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h Tue Nov  4
> 18:27:13 2014
> @@ -86,6 +86,7 @@ class ARMFunctionInfo : public MachineFu
>    /// areas.
>    unsigned GPRCS1Size;
>    unsigned GPRCS2Size;
> +  unsigned DPRCSAlignGapSize;
>    unsigned DPRCSSize;
> 
>    /// NumAlignedDPRCS2Regs - The number of callee-saved DPRs that are
> saved in
> @@ -134,7 +135,7 @@ public:
>      RestoreSPFromFP(false),
>      LRSpilledForFarJump(false),
>      FramePtrSpillOffset(0), GPRCS1Offset(0), GPRCS2Offset(0),
> DPRCSOffset(0),
> -    GPRCS1Size(0), GPRCS2Size(0), DPRCSSize(0),
> +    GPRCS1Size(0), GPRCS2Size(0), DPRCSAlignGapSize(0), DPRCSSize(0),
>      NumAlignedDPRCS2Regs(0),
>      JumpTableUId(0), PICLabelUId(0),
>      VarArgsFrameIndex(0), HasITBlocks(false), GlobalBaseReg(0) {}
> @@ -183,10 +184,12 @@ public:
> 
>    unsigned getGPRCalleeSavedArea1Size() const { return GPRCS1Size; }
>    unsigned getGPRCalleeSavedArea2Size() const { return GPRCS2Size; }
> +  unsigned getDPRCalleeSavedGapSize() const   { return
> DPRCSAlignGapSize; }
>    unsigned getDPRCalleeSavedAreaSize()  const { return DPRCSSize; }
> 
>    void setGPRCalleeSavedArea1Size(unsigned s) { GPRCS1Size = s; }
>    void setGPRCalleeSavedArea2Size(unsigned s) { GPRCS2Size = s; }
> +  void setDPRCalleeSavedGapSize(unsigned s)   { DPRCSAlignGapSize = s;
> }
>    void setDPRCalleeSavedAreaSize(unsigned s)  { DPRCSSize = s; }
> 
>    unsigned getArgumentStackSize() const { return ArgumentStackSize; }
> 
> Added: llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/ARM/dwarf-
> unwind.ll?rev=221320&view=auto
> =======================================================================
> =======
> --- llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll (added)
> +++ llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll Tue Nov  4 18:27:13
> 2014
> @@ -0,0 +1,68 @@
> +; RUN: llc -mtriple=thumbv7-netbsd-eabi -o - %s | FileCheck %s
> +declare void @bar()
> +
> +; ARM's frame lowering attempts to tack another callee-saved register
> onto the
> +; list when it detects a potential misaligned VFP store. However, if
> there are
> +; none available it used to just vpush anyway and misreport the
> location of the
> +; registers in unwind info. Since there are benefits to aligned
> stores, it's
> +; better to correct the code than the .cfi_offset directive.
> +
> +define void @test_dpr_align(i8 %l, i8 %r) {
> +; CHECK-LABEL: test_dpr_align:
> +; CHECK: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> +; CHECK: sub sp, #4
> +; CHECK: vpush {d8}
> +; CHECK: .cfi_offset d8, -48
> +; CHECK-NOT: sub sp
> +; [...]
> +; CHECK: bl bar
> +; CHECK-NOT: add sp
> +; CHECK: vpop {d8}
> +; CHECK: add sp, #4
> +; CHECK: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
> +  call void asm sideeffect "",
> "~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11},~{d8}"()
> +  call void @bar()
> +  ret void
> +}
> +
> +; The prologue (but not the epilogue) can be made more space efficient
> by
> +; chucking an argument register into the list. Not worth it in general
> though,
> +; "sub sp, #4" is likely faster.
> +define void @test_dpr_align_tiny(i8 %l, i8 %r) minsize {
> +; CHECK-LABEL: test_dpr_align_tiny:
> +; CHECK: push.w {r3, r4, r5, r6, r7, r8, r9, r10, r11, lr}
> +; CHECK-NOT: sub sp
> +; CHECK: vpush {d8}
> +; CHECK: .cfi_offset d8, -48
> +; CHECK-NOT: sub sp
> +; [...]
> +; CHECK: bl bar
> +; CHECK-NOT: add sp
> +; CHECK: vpop {d8}
> +; CHECK: add sp, #4
> +; CHECK: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
> +  call void asm sideeffect "",
> "~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11},~{d8}"()
> +  call void @bar()
> +  ret void
> +}
> +
> +
> +; However, we shouldn't do a 2-step align/adjust if there are no DPRs
> to be
> +; saved.
> +define void @test_nodpr_noalign(i8 %l, i8 %r) {
> +; CHECK-LABEL: test_nodpr_noalign:
> +; CHECK: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> +; CHECK-NOT: sub sp
> +; CHECK: sub sp, #12
> +; CHECK-NOT: sub sp
> +; [...]
> +; CHECK: bl bar
> +; CHECK-NOT: add sp
> +; CHECK: add sp, #12
> +; CHECK-NOT: add sp
> +; CHECK: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
> +  alloca i64
> +  call void asm sideeffect "",
> "~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"()
> +  call void @bar()
> +  ret void
> +}
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits








More information about the llvm-commits mailing list