[llvm] r221320 - ARM/Dwarf: correctly align stack before callee-saved VPRs
Oliver Stannard
oliver.stannard at arm.com
Fri Nov 14 08:24:42 PST 2014
Hi Tim,
I think this patch, or the related one, is causing miscompilations. These
manifest as either an argument being loaded from the wrong location on the
stack, or the final pop being from the wrong address, resulting in the PC
being restored incorrectly.
The following code demonstrates this:
void failed_i(int actual) {
printf("failed_i %d\n", actual);
}
double blah(int a, int b, int c, int d, double e) {
if (1 != a)
failed_i(a);
asm("nop"
:
:
: "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10",
"r11", "r12", "lr");
return e;
}
$ clang --target=armv7a--none-eabi -mcpu=cortex-a9 -c test.c -o test.o -Oz
-fno-inline
blah
0x00008190: e92d4ff8 .O-. PUSH {r3-r11,lr}
0x00008194: e28db01c .... ADD r11,sp,#0x1c
0x00008198: ed2d7b04 .{-. VPUSH {d7-d8}
0x0000819c: ed9b8b02 .... VLDR d8,[r11,#8]
0x000081a0: e3500001 ..P. CMP r0,#1
0x000081a4: 0a000000 .... BEQ 0x81ac ; blah + 28
0x000081a8: ebfffff1 .... BL failed_i ; 0x8174
0x000081ac: ec501b18 ..P. VMOV r1,r0,d8
0x000081b0: e58d0004 .... STR r0,[sp,#4]
0x000081b4: e58d1000 .... STR r1,[sp,#0]
0x000081b8: e320f000 .. . NOP
0x000081bc: e89d0003 .... LDM sp,{r0,r1}
0x000081c0: e24bd028 (.K. SUB sp,r11,#0x28
0x000081c4: ecbd8b02 .... VPOP {d8}
0x000081c8: e28dd004 .... ADD sp,sp,#4
0x000081cc: e8bd8ff0 .... POP {r4-r11,pc}
In this case, the stack pointer is reset 4 bytes too low by the SUB at
0x000081c0, resulting in the wrong value being loaded into the PC. If I
reduce the number of registers clobbered by the inline asm, the function
returns correctly, but the return value is incorrect.
Oliver
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Tim Northover
> Sent: 05 November 2014 00:27
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm] r221320 - ARM/Dwarf: correctly align stack before
> callee-saved VPRs
>
> Author: tnorthover
> Date: Tue Nov 4 18:27:13 2014
> New Revision: 221320
>
> URL: http://llvm.org/viewvc/llvm-project?rev=221320&view=rev
> Log:
> ARM/Dwarf: correctly align stack before callee-saved VPRs
>
> We were making an attempt to do this by adding an extra callee-saved
> GPR (so
> that there was an even number in the list), but when that failed we
> went ahead
> and pushed anyway.
>
> This had a couple of potential issues:
> + The .cfi directives we emit misplaced dN because they were based on
> PrologEpilogInserter's calculation.
> + Unaligned stores can be less efficient.
> + Unaligned stores can actually fault (likely only an issue in niche
> cases,
> but possible).
>
> This adds a final explicit stack adjustment if all other options fail,
> so that
> the actual locations of the registers match up with where they should
> be.
>
> Added:
> llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll
> Modified:
> llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp
> llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h
>
> Modified: llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp?rev=221320&r1=22
> 1319&r2=221320&view=diff
> =======================================================================
> =======
> --- llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp (original)
> +++ llvm/trunk/lib/Target/ARM/ARMFrameLowering.cpp Tue Nov 4 18:27:13
> 2014
> @@ -260,10 +260,11 @@ void ARMFrameLowering::emitPrologue(Mach
>
> // Determine starting offsets of spill areas.
> bool HasFP = hasFP(MF);
> - unsigned DPRCSOffset = NumBytes - (ArgRegsSaveSize + GPRCS1Size
> - + GPRCS2Size + DPRCSSize);
> - unsigned GPRCS2Offset = DPRCSOffset + DPRCSSize;
> - unsigned GPRCS1Offset = GPRCS2Offset + GPRCS2Size;
> + unsigned GPRCS1Offset = NumBytes - ArgRegsSaveSize - GPRCS1Size;
> + unsigned GPRCS2Offset = GPRCS1Offset - GPRCS2Size;
> + unsigned DPRAlign = DPRCSSize ? std::min(8U, Align) : 4U;
> + unsigned DPRGapSize = (GPRCS1Size + GPRCS2Size + ArgRegsSaveSize) %
> DPRAlign;
> + unsigned DPRCSOffset = GPRCS2Offset - DPRGapSize - DPRCSSize;
> int FramePtrOffsetInPush = 0;
> if (HasFP) {
> FramePtrOffsetInPush = MFI->getObjectOffset(FramePtrSpillFI)
> @@ -279,6 +280,15 @@ void ARMFrameLowering::emitPrologue(Mach
> if (GPRCS2Size > 0)
> GPRCS2Push = LastPush = MBBI++;
>
> + // Prolog/epilog inserter assumes we correctly align DPRs on the
> stack, so our
> + // .cfi_offset operations will reflect that.
> + if (DPRGapSize) {
> + assert(DPRGapSize == 4 && "unexpected alignment requirements for
> DPRs");
> + if (!tryFoldSPUpdateIntoPushPop(STI, MF, LastPush, DPRGapSize))
> + emitSPUpdate(isARM, MBB, MBBI, dl, TII, -DPRGapSize,
> + MachineInstr::FrameSetup);
> + }
> +
> // Move past area 3.
> if (DPRCSSize > 0) {
> DPRCSPush = MBBI;
> @@ -508,6 +518,7 @@ void ARMFrameLowering::emitPrologue(Mach
>
> AFI->setGPRCalleeSavedArea1Size(GPRCS1Size);
> AFI->setGPRCalleeSavedArea2Size(GPRCS2Size);
> + AFI->setDPRCalleeSavedGapSize(DPRGapSize);
> AFI->setDPRCalleeSavedAreaSize(DPRCSSize);
>
> // If we need dynamic stack realignment, do it here. Be paranoid and
> make
> @@ -613,6 +624,7 @@ void ARMFrameLowering::emitEpilogue(Mach
> NumBytes -= (ArgRegsSaveSize +
> AFI->getGPRCalleeSavedArea1Size() +
> AFI->getGPRCalleeSavedArea2Size() +
> + AFI->getDPRCalleeSavedGapSize() +
> AFI->getDPRCalleeSavedAreaSize());
>
> // Reset SP based on frame pointer only if the stack frame extends
> beyond
> @@ -661,6 +673,12 @@ void ARMFrameLowering::emitEpilogue(Mach
> while (MBBI->getOpcode() == ARM::VLDMDIA_UPD)
> MBBI++;
> }
> + if (AFI->getDPRCalleeSavedGapSize()) {
> + assert(AFI->getDPRCalleeSavedGapSize() == 4 &&
> + "unexpected DPR alignment gap");
> + emitSPUpdate(isARM, MBB, MBBI, dl, TII, AFI-
> >getDPRCalleeSavedGapSize());
> + }
> +
> if (AFI->getGPRCalleeSavedArea2Size()) MBBI++;
> if (AFI->getGPRCalleeSavedArea1Size()) MBBI++;
> }
>
> Modified: llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h?rev=221320&r
> 1=221319&r2=221320&view=diff
> =======================================================================
> =======
> --- llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h (original)
> +++ llvm/trunk/lib/Target/ARM/ARMMachineFunctionInfo.h Tue Nov 4
> 18:27:13 2014
> @@ -86,6 +86,7 @@ class ARMFunctionInfo : public MachineFu
> /// areas.
> unsigned GPRCS1Size;
> unsigned GPRCS2Size;
> + unsigned DPRCSAlignGapSize;
> unsigned DPRCSSize;
>
> /// NumAlignedDPRCS2Regs - The number of callee-saved DPRs that are
> saved in
> @@ -134,7 +135,7 @@ public:
> RestoreSPFromFP(false),
> LRSpilledForFarJump(false),
> FramePtrSpillOffset(0), GPRCS1Offset(0), GPRCS2Offset(0),
> DPRCSOffset(0),
> - GPRCS1Size(0), GPRCS2Size(0), DPRCSSize(0),
> + GPRCS1Size(0), GPRCS2Size(0), DPRCSAlignGapSize(0), DPRCSSize(0),
> NumAlignedDPRCS2Regs(0),
> JumpTableUId(0), PICLabelUId(0),
> VarArgsFrameIndex(0), HasITBlocks(false), GlobalBaseReg(0) {}
> @@ -183,10 +184,12 @@ public:
>
> unsigned getGPRCalleeSavedArea1Size() const { return GPRCS1Size; }
> unsigned getGPRCalleeSavedArea2Size() const { return GPRCS2Size; }
> + unsigned getDPRCalleeSavedGapSize() const { return
> DPRCSAlignGapSize; }
> unsigned getDPRCalleeSavedAreaSize() const { return DPRCSSize; }
>
> void setGPRCalleeSavedArea1Size(unsigned s) { GPRCS1Size = s; }
> void setGPRCalleeSavedArea2Size(unsigned s) { GPRCS2Size = s; }
> + void setDPRCalleeSavedGapSize(unsigned s) { DPRCSAlignGapSize = s;
> }
> void setDPRCalleeSavedAreaSize(unsigned s) { DPRCSSize = s; }
>
> unsigned getArgumentStackSize() const { return ArgumentStackSize; }
>
> Added: llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/ARM/dwarf-
> unwind.ll?rev=221320&view=auto
> =======================================================================
> =======
> --- llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll (added)
> +++ llvm/trunk/test/CodeGen/ARM/dwarf-unwind.ll Tue Nov 4 18:27:13
> 2014
> @@ -0,0 +1,68 @@
> +; RUN: llc -mtriple=thumbv7-netbsd-eabi -o - %s | FileCheck %s
> +declare void @bar()
> +
> +; ARM's frame lowering attempts to tack another callee-saved register
> onto the
> +; list when it detects a potential misaligned VFP store. However, if
> there are
> +; none available it used to just vpush anyway and misreport the
> location of the
> +; registers in unwind info. Since there are benefits to aligned
> stores, it's
> +; better to correct the code than the .cfi_offset directive.
> +
> +define void @test_dpr_align(i8 %l, i8 %r) {
> +; CHECK-LABEL: test_dpr_align:
> +; CHECK: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> +; CHECK: sub sp, #4
> +; CHECK: vpush {d8}
> +; CHECK: .cfi_offset d8, -48
> +; CHECK-NOT: sub sp
> +; [...]
> +; CHECK: bl bar
> +; CHECK-NOT: add sp
> +; CHECK: vpop {d8}
> +; CHECK: add sp, #4
> +; CHECK: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
> + call void asm sideeffect "",
> "~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11},~{d8}"()
> + call void @bar()
> + ret void
> +}
> +
> +; The prologue (but not the epilogue) can be made more space efficient
> by
> +; chucking an argument register into the list. Not worth it in general
> though,
> +; "sub sp, #4" is likely faster.
> +define void @test_dpr_align_tiny(i8 %l, i8 %r) minsize {
> +; CHECK-LABEL: test_dpr_align_tiny:
> +; CHECK: push.w {r3, r4, r5, r6, r7, r8, r9, r10, r11, lr}
> +; CHECK-NOT: sub sp
> +; CHECK: vpush {d8}
> +; CHECK: .cfi_offset d8, -48
> +; CHECK-NOT: sub sp
> +; [...]
> +; CHECK: bl bar
> +; CHECK-NOT: add sp
> +; CHECK: vpop {d8}
> +; CHECK: add sp, #4
> +; CHECK: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
> + call void asm sideeffect "",
> "~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11},~{d8}"()
> + call void @bar()
> + ret void
> +}
> +
> +
> +; However, we shouldn't do a 2-step align/adjust if there are no DPRs
> to be
> +; saved.
> +define void @test_nodpr_noalign(i8 %l, i8 %r) {
> +; CHECK-LABEL: test_nodpr_noalign:
> +; CHECK: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
> +; CHECK-NOT: sub sp
> +; CHECK: sub sp, #12
> +; CHECK-NOT: sub sp
> +; [...]
> +; CHECK: bl bar
> +; CHECK-NOT: add sp
> +; CHECK: add sp, #12
> +; CHECK-NOT: add sp
> +; CHECK: pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
> + alloca i64
> + call void asm sideeffect "",
> "~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"()
> + call void @bar()
> + ret void
> +}
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list