[PATCH][ARM] Prevent stack guard address from being spilled
Akira Hatanaka
ahatanak at gmail.com
Fri Jul 25 12:03:30 PDT 2014
Alternatively, we can prioritize the live interval of the stack guard value
to keep it in a register as much as possible and avoid having to spill to
the stack or do remat.
On Fri, Jul 25, 2014 at 11:55 AM, Akira Hatanaka <ahatanak at gmail.com> wrote:
> Yes, I agree that is expensive, but register allocation will spill the
> address to the stack if we don't fix anything.
>
> calll L0$pb
>
> L0$pb:
>
> popl %eax
>
> movl L___stack_chk_guard$non_lazy_ptr-L0$pb(%eax), %eax
>
> movl %eax, -1052(%ebp) ## 4-byte Spill
>
> ...
>
> ## InlineAsm End
>
> movl -1052(%ebp), %eax ## 4-byte Reload
>
>
> On Fri, Jul 25, 2014 at 11:50 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>
>>
>> On Jul 25, 2014, at 11:45 AM, Akira Hatanaka <ahatanak at gmail.com> wrote:
>>
>> I realized that it might be possible to use this scheme for 32-bit x86
>> too. It looks like 32-bit x86 uses call+pop to get the PC and emulate
>> PC-relative load, in which case we can remat the stack guard address or the
>> value.
>>
>> calll L0$pb
>> L0$pb:
>> popl %eax
>> movl L___stack_chk_guard$non_lazy_ptr-L0$pb(%eax), %eax
>> movl %eax, -1052(%ebp) ## 4-byte Spill
>> movl (%eax), %eax
>>
>> I'll add support for 32-bit x86 and update comments in a follow-up patch.
>>
>>
>> That’s going to be relatively expensive in terms of runtime performance.
>> I’m not sure it makes sense to do that.
>>
>>
>>
>>
>> On Fri, Jul 25, 2014 at 9:20 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>>
>>>
>>> On Jul 18, 2014, at 4:33 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
>>>
>>> Updated patch which includes the following changes is attached.
>>>
>>> - Disabled using LOAD_STACK_GUARD if target is 32-bit x86. 32-bit x86
>>> needs an extra register to load the stack guard value, which prevents it
>>> from being rematted.
>>> - Added test cases for x86 and thumb1.
>>> - Refactored and simplified functions in ARMBaseInstrInfo which expand
>>> the LOAD_STACK_GUARD pseudo.
>>> - Renamed function parameters and fixed indentations.
>>>
>>>
>>> Thanks. This looks good to commit now.
>>>
>>> Can you add comments to ARMTargetLowering::useLoadStackGuardNode
>>> and X86TargetLowering::useLoadStackGuardNode to explain why this is only
>>> enabled for certain targets? I.E., it provides better security but has so
>>> far only been implemented for MachO targets on ARM and x86_64, and that for
>>> 32-bit x86 it doesn’t work because we can’t remat without an extra
>>> register. You can add those comments in a follow-up patch if you prefer.
>>>
>>>
>>>
>>> On Mon, Jul 14, 2014 at 10:49 PM, Akira Hatanaka <ahatanak at gmail.com>
>>> wrote:
>>>
>>>> Updated patch attached.
>>>>
>>>> I made the following changes in addition to fixing the problems Bob
>>>> pointed out:
>>>>
>>>> - In SelectionDAGBuilder::visitIntrinsicCall, replaced the call to
>>>> LoadInst::getAlignment (which was returning 0) with
>>>> SelectionDAG::getEVTAlignment to get the load instruction's alignment.
>>>> - Changed STATIC to STATIC-NEXT in test case
>>>> CodeGen/ARM/stack_guard_remat.ll.
>>>>
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 12:43 PM, Bob Wilson <bob.wilson at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> > On Jul 11, 2014, at 3:34 PM, Akira Hatanaka <ahatanak at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > The attached patch fixes a bug where the address of the stack guard
>>>>> was being spilled to the stack, which is a potential security vulnerability
>>>>> attackers can take advantage of.
>>>>>
>>>>> I have a few comments inline below….
>>>>>
>>>>> > diff --git a/include/llvm/Target/Target.td
>>>>> b/include/llvm/Target/Target.td
>>>>> > index f77cc7a..26e1242 100644
>>>>> > --- a/include/llvm/Target/Target.td
>>>>> > +++ b/include/llvm/Target/Target.td
>>>>> > @@ -841,6 +841,14 @@ def PATCHPOINT : Instruction {
>>>>> > let mayLoad = 1;
>>>>> > let usesCustomInserter = 1;
>>>>> > }
>>>>> > +def LOAD_STACK_GUARD : Instruction {
>>>>> > + let OutOperandList = (outs ptr_rc:$dst);
>>>>> > + let InOperandList = (ins);
>>>>> > + let mayLoad = 1;
>>>>> > + bit isReMaterializable = 1;
>>>>> > + let neverHasSideEffects = 1;
>>>>>
>>>>> neverHasSideEffects is deprecated. This should be "hasSideEffects = 0”.
>>>>>
>>>>> > + bit isPseudo = 1;
>>>>> > +}
>>>>> > }
>>>>> >
>>>>> >
>>>>> //===----------------------------------------------------------------------===//
>>>>> > diff --git a/include/llvm/Target/TargetLowering.h
>>>>> b/include/llvm/Target/TargetLowering.h
>>>>> > index 180f0f2..ef0fc27 100644
>>>>> > --- a/include/llvm/Target/TargetLowering.h
>>>>> > +++ b/include/llvm/Target/TargetLowering.h
>>>>> > @@ -2589,6 +2589,13 @@ public:
>>>>> > /// ARM 's' setting instructions.
>>>>> > virtual void
>>>>> > AdjustInstrPostInstrSelection(MachineInstr *MI, SDNode *Node)
>>>>> const;
>>>>> > +
>>>>> > + /// If this function returns true, SelectionDAGBuilder emits a
>>>>> > + /// LOAD_STACK_GUARD node when it is lowering
>>>>> Intrinsic::stackprotector.
>>>>> > + virtual bool emitLoadStackGuardNode() const {
>>>>>
>>>>> The name of this function implies that it emits something. It doesn’t.
>>>>> It just returns a bool to say whether the target should use the new
>>>>> LOAD_STACK_GUARD node. The name should reflect that. How about
>>>>> useLoadStackGuardNode?
>>>>>
>>>>> > + return false;
>>>>> > + }
>>>>> > +
>>>>>
>>>>> There’s no need to add a blank line here.
>>>>>
>>>>> > };
>>>>> >
>>>>> > /// Given an LLVM IR type and return type attributes, compute the
>>>>> return value
>>>>> >
>>>>> > diff --git a/include/llvm/Target/TargetOpcodes.h
>>>>> b/include/llvm/Target/TargetOpcodes.h
>>>>> > index abb3eca..1fbd2ae 100644
>>>>> > --- a/include/llvm/Target/TargetOpcodes.h
>>>>> > +++ b/include/llvm/Target/TargetOpcodes.h
>>>>> > @@ -104,7 +104,13 @@ enum {
>>>>> > /// support optimizations for dynamic languages (such as
>>>>> javascript) that
>>>>> > /// rewrite calls to runtimes with more efficient code sequences.
>>>>> > /// This also implies a stack map.
>>>>> > - PATCHPOINT = 18
>>>>> > + PATCHPOINT = 18,
>>>>> > +
>>>>> > + /// This pseudo-instruction loads the stack guard value. Targets
>>>>> which need
>>>>> > + /// to prevent the stack guard value or address from being
>>>>> spilled to the
>>>>> > + /// stack should override TargetLowering::emitLoadStackGuardNode
>>>>> and
>>>>> > + /// additionally expand this pseudo after register allocation.
>>>>> > + LOAD_STACK_GUARD = 19
>>>>> > };
>>>>> > } // end namespace TargetOpcode
>>>>> > } // end namespace llvm
>>>>> >
>>>>> > diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>>> b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>>> > index 28d8e98..0d89776 100644
>>>>> > --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>>> > +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>>> > @@ -1811,9 +1811,19 @@ void
>>>>> SelectionDAGBuilder::visitSPDescriptorParent(StackProtectorDescriptor &SPD,
>>>>> >
>>>>> > unsigned Align =
>>>>> > TLI->getDataLayout()->getPrefTypeAlignment(IRGuard->getType());
>>>>> > - SDValue Guard = DAG.getLoad(PtrTy, getCurSDLoc(),
>>>>> DAG.getEntryNode(),
>>>>> > - GuardPtr, MachinePointerInfo(IRGuard,
>>>>> 0),
>>>>> > - true, false, false, Align);
>>>>> > +
>>>>> > + SDValue Guard;
>>>>> > +
>>>>> > + // If emitLoadStackGuardNode returns true, retrieve the guard
>>>>> value from
>>>>> > + // the virtual register holding the value. Otherwise, emit a
>>>>> volatile load
>>>>> > + // to retrieve the stack guard value.
>>>>> > + if (TLI->emitLoadStackGuardNode())
>>>>> > + Guard = DAG.getCopyFromReg(DAG.getEntryNode(), getCurSDLoc(),
>>>>> > + SPD.getGuardReg(), PtrTy);
>>>>> > + else
>>>>> > + Guard = DAG.getLoad(PtrTy, getCurSDLoc(), DAG.getEntryNode(),
>>>>> > + GuardPtr, MachinePointerInfo(IRGuard, 0),
>>>>> > + true, false, false, Align);
>>>>> >
>>>>> > SDValue StackSlot = DAG.getLoad(PtrTy, getCurSDLoc(),
>>>>> DAG.getEntryNode(),
>>>>> > StackSlotPtr,
>>>>> > @@ -5220,8 +5230,35 @@ SelectionDAGBuilder::visitIntrinsicCall(const
>>>>> CallInst &I, unsigned Intrinsic) {
>>>>> > MachineFunction &MF = DAG.getMachineFunction();
>>>>> > MachineFrameInfo *MFI = MF.getFrameInfo();
>>>>> > EVT PtrTy = TLI->getPointerTy();
>>>>> > + SDValue Src, Chain = getRoot();
>>>>>
>>>>> > +
>>>>> > + if (TLI->emitLoadStackGuardNode()) {
>>>>> > + // Emit a LOAD_STACK_GUARD node.
>>>>> > + MachineSDNode *Node =
>>>>> DAG.getMachineNode(TargetOpcode::LOAD_STACK_GUARD,
>>>>> > + sdl, PtrTy, Chain);
>>>>> > + LoadInst *LI = cast<LoadInst>(I.getArgOperand(0));
>>>>> > + MachinePointerInfo MPInfo(LI->getPointerOperand());
>>>>> > + MachineInstr::mmo_iterator MemRefs =
>>>>> MF.allocateMemRefsArray(1);
>>>>> > + unsigned Flags = MachineMemOperand::MOLoad |
>>>>> > + MachineMemOperand::MOInvariant;
>>>>> > + *MemRefs = MF.getMachineMemOperand(MPInfo, Flags,
>>>>> > + PtrTy.getSizeInBits() / 8,
>>>>> > + LI->getAlignment());
>>>>> > + Node->setMemRefs(MemRefs, MemRefs + 1);
>>>>> > +
>>>>> > + // Copy the guard value to a virtual register so that it can
>>>>> be
>>>>> > + // retrieved in the epilogue.
>>>>> > + Src = SDValue(Node, 0);
>>>>> > + const TargetRegisterClass *RC =
>>>>> > + TLI->getRegClassFor(Src.getSimpleValueType());
>>>>> > + unsigned Reg = MF.getRegInfo().createVirtualRegister(RC);
>>>>> > +
>>>>> > + SPDescriptor.setGuardReg(Reg);
>>>>> > + Chain = DAG.getCopyToReg(Chain, sdl, Reg, Src);
>>>>> > + } else {
>>>>> > + Src = getValue(I.getArgOperand(0)); // The guard's value.
>>>>> > + }
>>>>> >
>>>>> > - SDValue Src = getValue(I.getArgOperand(0)); // The guard's
>>>>> value.
>>>>> > AllocaInst *Slot = cast<AllocaInst>(I.getArgOperand(1));
>>>>> >
>>>>> > int FI = FuncInfo.StaticAllocaMap[Slot];
>>>>> > @@ -5230,7 +5267,7 @@ SelectionDAGBuilder::visitIntrinsicCall(const
>>>>> CallInst &I, unsigned Intrinsic) {
>>>>> > SDValue FIN = DAG.getFrameIndex(FI, PtrTy);
>>>>> >
>>>>> > // Store the stack protector onto the stack.
>>>>> > - Res = DAG.getStore(getRoot(), sdl, Src, FIN,
>>>>> > + Res = DAG.getStore(Chain, sdl, Src, FIN,
>>>>> > MachinePointerInfo::getFixedStack(FI),
>>>>> > true, false, 0);
>>>>> > setValue(&I, Res);
>>>>> > diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>>> b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>>> > index 84679f9..cf0846f 100644
>>>>> > --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>>> > +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>>> > @@ -397,7 +397,8 @@ private:
>>>>> > class StackProtectorDescriptor {
>>>>> > public:
>>>>> > StackProtectorDescriptor() : ParentMBB(nullptr),
>>>>> SuccessMBB(nullptr),
>>>>> > - FailureMBB(nullptr),
>>>>> Guard(nullptr) { }
>>>>> > + FailureMBB(nullptr),
>>>>> Guard(nullptr),
>>>>> > + GuardReg(0) { }
>>>>> > ~StackProtectorDescriptor() { }
>>>>> >
>>>>> > /// Returns true if all fields of the stack protector
>>>>> descriptor are
>>>>> > @@ -455,6 +456,9 @@ private:
>>>>> > MachineBasicBlock *getFailureMBB() { return FailureMBB; }
>>>>> > const Value *getGuard() { return Guard; }
>>>>> >
>>>>> > + unsigned getGuardReg() const { return GuardReg; }
>>>>> > + void setGuardReg(unsigned R) { GuardReg = R; }
>>>>> > +
>>>>> > private:
>>>>> > /// The basic block for which we are generating the stack
>>>>> protector.
>>>>> > ///
>>>>> > @@ -477,6 +481,9 @@ private:
>>>>> > /// stack protector stack slot.
>>>>> > const Value *Guard;
>>>>> >
>>>>> > + /// The virtual register holding the stack guard value.
>>>>> > + unsigned GuardReg;
>>>>> > +
>>>>> > /// Add a successor machine basic block to ParentMBB. If the
>>>>> successor mbb
>>>>> > /// has not been created yet (i.e. if SuccMBB = 0), then the
>>>>> machine basic
>>>>> > /// block will be created.
>>>>> > diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>>> b/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>>> > index 07ff093..97bec0d 100644
>>>>> > --- a/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>>> > +++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>>> > @@ -7914,6 +7914,10 @@ bool
>>>>> AArch64TargetLowering::shouldExpandAtomicInIR(Instruction *Inst) const {
>>>>> > return Inst->getType()->getPrimitiveSizeInBits() <= 128;
>>>>> > }
>>>>> >
>>>>> > +bool AArch64TargetLowering::emitLoadStackGuardNode() const {
>>>>> > + return Subtarget->isTargetMachO();
>>>>> > +}
>>>>> > +
>>>>> > TargetLoweringBase::LegalizeTypeAction
>>>>> > AArch64TargetLowering::getPreferredVectorAction(EVT VT) const {
>>>>> > MVT SVT = VT.getSimpleVT();
>>>>> > diff --git a/lib/Target/AArch64/AArch64ISelLowering.h
>>>>> b/lib/Target/AArch64/AArch64ISelLowering.h
>>>>> > index cb0b9ef..7500074 100644
>>>>> > --- a/lib/Target/AArch64/AArch64ISelLowering.h
>>>>> > +++ b/lib/Target/AArch64/AArch64ISelLowering.h
>>>>> > @@ -324,6 +324,7 @@ public:
>>>>> >
>>>>> > bool shouldExpandAtomicInIR(Instruction *Inst) const override;
>>>>> >
>>>>> > + bool emitLoadStackGuardNode() const override;
>>>>> > TargetLoweringBase::LegalizeTypeAction
>>>>> > getPreferredVectorAction(EVT VT) const override;
>>>>> > diff --git a/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>>> b/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>>> > index ce85b2c..bfb32d6 100644
>>>>> > --- a/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>>> > +++ b/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>>> > @@ -848,6 +848,29 @@ bool AArch64InstrInfo::optimizeCompareInstr(
>>>>> > return true;
>>>>> > }
>>>>> >
>>>>> > +bool
>>>>> > +AArch64InstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator
>>>>> MI) const {
>>>>> > + if (MI->getOpcode() != TargetOpcode::LOAD_STACK_GUARD)
>>>>> > + return false;
>>>>> > +
>>>>> > + assert(Subtarget.isTargetMachO() &&
>>>>> > + "LOAD_STACK_GUARD currently supported only for MachO.”);
>>>>>
>>>>> At least for AArch64, this expansion doesn’t look very specific to
>>>>> MachO. How much harder would it be to make it general? Or are you just
>>>>> being cautious here because you can’t test the other platforms?
>>>>>
>>>>> > +
>>>>> > + MachineBasicBlock &MBB = *MI->getParent();
>>>>> > + DebugLoc DL = MI->getDebugLoc();
>>>>> > + unsigned Reg = MI->getOperand(0).getReg();
>>>>> > + const GlobalValue *GV =
>>>>> > + cast<GlobalValue>((*MI->memoperands_begin())->getValue());
>>>>> > + MachineOperand GA = MachineOperand::CreateGA(GV, 0,
>>>>> AArch64II::MO_GOT);
>>>>>
>>>>> I noticed that you’re using MachineOperand::CreateGA throughout this
>>>>> patch, but almost everywhere else we use MIB’s addGlobalAddress method
>>>>> instead. I don’t see much difference but unless there’s some reason to
>>>>> prefer direct access to CreateGA, it would be better to follow the usual
>>>>> convention.
>>>>>
>>>>> > + BuildMI(MBB, MI, DL, get(AArch64::LOADgot), Reg).addOperand(GA);
>>>>> > + BuildMI(MBB, MI, DL, get(AArch64::LDRXui), Reg)
>>>>> > + .addReg(Reg, RegState::Kill).addImm(0)
>>>>> > + .addMemOperand(*MI->memoperands_begin());
>>>>> > + MBB.erase(MI);
>>>>> > +
>>>>> > + return true;
>>>>> > +}
>>>>> > +
>>>>> > /// Return true if this is this instruction has a non-zero immediate
>>>>> > bool AArch64InstrInfo::hasShiftedReg(const MachineInstr *MI) const {
>>>>> > switch (MI->getOpcode()) {
>>>>> >
>>>>>
>>>>>
>>>>> The rest of this looks fine.
>>>>
>>>>
>>>>
>>> <stack_guard3.patch>
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140725/65afbcb8/attachment.html>
More information about the llvm-commits
mailing list