[PATCH][ARM] Prevent stack guard address from being spilled

Akira Hatanaka ahatanak at gmail.com
Fri Jul 25 11:55:52 PDT 2014


Yes, I agree that is expensive, but register allocation will spill the
address to the stack if we don't fix anything.

calll L0$pb

L0$pb:

popl %eax

movl L___stack_chk_guard$non_lazy_ptr-L0$pb(%eax), %eax

movl %eax, -1052(%ebp)       ## 4-byte Spill

...

## InlineAsm End

 movl -1052(%ebp), %eax       ## 4-byte Reload


On Fri, Jul 25, 2014 at 11:50 AM, Bob Wilson <bob.wilson at apple.com> wrote:

>
> On Jul 25, 2014, at 11:45 AM, Akira Hatanaka <ahatanak at gmail.com> wrote:
>
> I realized that it might be possible to use this scheme for 32-bit x86
> too. It looks like 32-bit x86 uses call+pop to get the PC and emulate
> PC-relative load, in which case we can remat the stack guard address or the
> value.
>
> calll L0$pb
> L0$pb:
> popl %eax
> movl L___stack_chk_guard$non_lazy_ptr-L0$pb(%eax), %eax
> movl %eax, -1052(%ebp)       ## 4-byte Spill
> movl (%eax), %eax
>
> I'll add support for 32-bit x86 and update comments in a follow-up patch.
>
>
> That’s going to be relatively expensive in terms of runtime performance.
> I’m not sure it makes sense to do that.
>
>
>
>
> On Fri, Jul 25, 2014 at 9:20 AM, Bob Wilson <bob.wilson at apple.com> wrote:
>
>>
>> On Jul 18, 2014, at 4:33 PM, Akira Hatanaka <ahatanak at gmail.com> wrote:
>>
>> Updated patch which includes the following changes is attached.
>>
>> - Disabled using LOAD_STACK_GUARD if target is 32-bit x86. 32-bit x86
>> needs an extra register to load the stack guard value, which prevents it
>> from being rematted.
>> - Added test cases for x86 and thumb1.
>> - Refactored and simplified functions in ARMBaseInstrInfo which expand
>> the LOAD_STACK_GUARD pseudo.
>> - Renamed function parameters and fixed indentations.
>>
>>
>> Thanks. This looks good to commit now.
>>
>> Can you add comments to ARMTargetLowering::useLoadStackGuardNode
>> and X86TargetLowering::useLoadStackGuardNode to explain why this is only
>> enabled for certain targets? I.E., it provides better security but has so
>> far only been implemented for MachO targets on ARM and x86_64, and that for
>> 32-bit x86 it doesn’t work because we can’t remat without an extra
>> register. You can add those comments in a follow-up patch if you prefer.
>>
>>
>>
>> On Mon, Jul 14, 2014 at 10:49 PM, Akira Hatanaka <ahatanak at gmail.com>
>> wrote:
>>
>>> Updated patch attached.
>>>
>>> I made the following changes in addition to fixing the problems Bob
>>> pointed out:
>>>
>>> - In SelectionDAGBuilder::visitIntrinsicCall, replaced the call to
>>> LoadInst::getAlignment (which was returning 0) with
>>>  SelectionDAG::getEVTAlignment to get the load instruction's alignment.
>>> - Changed STATIC to STATIC-NEXT in test case
>>> CodeGen/ARM/stack_guard_remat.ll.
>>>
>>>
>>>
>>> On Mon, Jul 14, 2014 at 12:43 PM, Bob Wilson <bob.wilson at apple.com>
>>> wrote:
>>>
>>>>
>>>> > On Jul 11, 2014, at 3:34 PM, Akira Hatanaka <ahatanak at gmail.com>
>>>> wrote:
>>>> >
>>>> > The attached patch fixes a bug where the address of the stack guard
>>>> was being spilled to the stack, which is a potential security vulnerability
>>>> attackers can take advantage of.
>>>>
>>>> I have a few comments inline below….
>>>>
>>>> > diff --git a/include/llvm/Target/Target.td
>>>> b/include/llvm/Target/Target.td
>>>> > index f77cc7a..26e1242 100644
>>>> > --- a/include/llvm/Target/Target.td
>>>> > +++ b/include/llvm/Target/Target.td
>>>> > @@ -841,6 +841,14 @@ def PATCHPOINT : Instruction {
>>>> >    let mayLoad = 1;
>>>> >    let usesCustomInserter = 1;
>>>> >  }
>>>> > +def LOAD_STACK_GUARD : Instruction {
>>>> > +  let OutOperandList = (outs ptr_rc:$dst);
>>>> > +  let InOperandList = (ins);
>>>> > +  let mayLoad = 1;
>>>> > +  bit isReMaterializable = 1;
>>>> > +  let neverHasSideEffects = 1;
>>>>
>>>> neverHasSideEffects is deprecated. This should be "hasSideEffects = 0”.
>>>>
>>>> > +  bit isPseudo = 1;
>>>> > +}
>>>> >  }
>>>> >
>>>> >
>>>>  //===----------------------------------------------------------------------===//
>>>> > diff --git a/include/llvm/Target/TargetLowering.h
>>>> b/include/llvm/Target/TargetLowering.h
>>>> > index 180f0f2..ef0fc27 100644
>>>> > --- a/include/llvm/Target/TargetLowering.h
>>>> > +++ b/include/llvm/Target/TargetLowering.h
>>>> > @@ -2589,6 +2589,13 @@ public:
>>>> >    /// ARM 's' setting instructions.
>>>> >    virtual void
>>>> >    AdjustInstrPostInstrSelection(MachineInstr *MI, SDNode *Node)
>>>> const;
>>>> > +
>>>> > +  /// If this function returns true, SelectionDAGBuilder emits a
>>>> > +  /// LOAD_STACK_GUARD node when it is lowering
>>>> Intrinsic::stackprotector.
>>>> > +  virtual bool emitLoadStackGuardNode() const {
>>>>
>>>> The name of this function implies that it emits something. It doesn’t.
>>>> It just returns a bool to say whether the target should use the new
>>>> LOAD_STACK_GUARD node. The name should reflect that. How about
>>>> useLoadStackGuardNode?
>>>>
>>>> > +    return false;
>>>> > +  }
>>>> > +
>>>>
>>>> There’s no need to add a blank line here.
>>>>
>>>> >  };
>>>> >
>>>> >  /// Given an LLVM IR type and return type attributes, compute the
>>>> return value
>>>> >
>>>> > diff --git a/include/llvm/Target/TargetOpcodes.h
>>>> b/include/llvm/Target/TargetOpcodes.h
>>>> > index abb3eca..1fbd2ae 100644
>>>> > --- a/include/llvm/Target/TargetOpcodes.h
>>>> > +++ b/include/llvm/Target/TargetOpcodes.h
>>>> > @@ -104,7 +104,13 @@ enum {
>>>> >    /// support optimizations for dynamic languages (such as
>>>> javascript) that
>>>> >    /// rewrite calls to runtimes with more efficient code sequences.
>>>> >    /// This also implies a stack map.
>>>> > -  PATCHPOINT = 18
>>>> > +  PATCHPOINT = 18,
>>>> > +
>>>> > +  /// This pseudo-instruction loads the stack guard value. Targets
>>>> which need
>>>> > +  /// to prevent the stack guard value or address from being spilled
>>>> to the
>>>> > +  /// stack should override TargetLowering::emitLoadStackGuardNode
>>>> and
>>>> > +  /// additionally expand this pseudo after register allocation.
>>>> > +  LOAD_STACK_GUARD = 19
>>>> >  };
>>>> >  } // end namespace TargetOpcode
>>>> >  } // end namespace llvm
>>>> >
>>>> > diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>> b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>> > index 28d8e98..0d89776 100644
>>>> > --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>> > +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>>>> > @@ -1811,9 +1811,19 @@ void
>>>> SelectionDAGBuilder::visitSPDescriptorParent(StackProtectorDescriptor &SPD,
>>>> >
>>>> >    unsigned Align =
>>>> >      TLI->getDataLayout()->getPrefTypeAlignment(IRGuard->getType());
>>>> > -  SDValue Guard = DAG.getLoad(PtrTy, getCurSDLoc(),
>>>> DAG.getEntryNode(),
>>>> > -                              GuardPtr, MachinePointerInfo(IRGuard,
>>>> 0),
>>>> > -                              true, false, false, Align);
>>>> > +
>>>> > +  SDValue Guard;
>>>> > +
>>>> > +  // If emitLoadStackGuardNode returns true, retrieve the guard
>>>> value from
>>>> > +  // the virtual register holding the value. Otherwise, emit a
>>>> volatile load
>>>> > +  // to retrieve the stack guard value.
>>>> > +  if (TLI->emitLoadStackGuardNode())
>>>> > +    Guard = DAG.getCopyFromReg(DAG.getEntryNode(), getCurSDLoc(),
>>>> > +                               SPD.getGuardReg(), PtrTy);
>>>> > +  else
>>>> > +    Guard = DAG.getLoad(PtrTy, getCurSDLoc(), DAG.getEntryNode(),
>>>> > +                        GuardPtr, MachinePointerInfo(IRGuard, 0),
>>>> > +                        true, false, false, Align);
>>>> >
>>>> >    SDValue StackSlot = DAG.getLoad(PtrTy, getCurSDLoc(),
>>>> DAG.getEntryNode(),
>>>> >                                    StackSlotPtr,
>>>> > @@ -5220,8 +5230,35 @@ SelectionDAGBuilder::visitIntrinsicCall(const
>>>> CallInst &I, unsigned Intrinsic) {
>>>> >      MachineFunction &MF = DAG.getMachineFunction();
>>>> >      MachineFrameInfo *MFI = MF.getFrameInfo();
>>>> >      EVT PtrTy = TLI->getPointerTy();
>>>> > +    SDValue Src, Chain = getRoot();
>>>>
>>>> > +
>>>> > +    if (TLI->emitLoadStackGuardNode()) {
>>>> > +      // Emit a LOAD_STACK_GUARD node.
>>>> > +      MachineSDNode *Node =
>>>> DAG.getMachineNode(TargetOpcode::LOAD_STACK_GUARD,
>>>> > +                                               sdl, PtrTy, Chain);
>>>> > +      LoadInst *LI = cast<LoadInst>(I.getArgOperand(0));
>>>> > +      MachinePointerInfo MPInfo(LI->getPointerOperand());
>>>> > +      MachineInstr::mmo_iterator MemRefs =
>>>> MF.allocateMemRefsArray(1);
>>>> > +      unsigned Flags = MachineMemOperand::MOLoad |
>>>> > +                       MachineMemOperand::MOInvariant;
>>>> > +      *MemRefs = MF.getMachineMemOperand(MPInfo, Flags,
>>>> > +                                         PtrTy.getSizeInBits() / 8,
>>>> > +                                         LI->getAlignment());
>>>> > +      Node->setMemRefs(MemRefs, MemRefs + 1);
>>>> > +
>>>> > +      // Copy the guard value to a virtual register so that it can be
>>>> > +      // retrieved in the epilogue.
>>>> > +      Src = SDValue(Node, 0);
>>>> > +      const TargetRegisterClass *RC =
>>>> > +          TLI->getRegClassFor(Src.getSimpleValueType());
>>>> > +      unsigned Reg = MF.getRegInfo().createVirtualRegister(RC);
>>>> > +
>>>> > +      SPDescriptor.setGuardReg(Reg);
>>>> > +      Chain = DAG.getCopyToReg(Chain, sdl, Reg, Src);
>>>> > +    } else {
>>>> > +      Src = getValue(I.getArgOperand(0));   // The guard's value.
>>>> > +    }
>>>> >
>>>> > -    SDValue Src = getValue(I.getArgOperand(0));   // The guard's
>>>> value.
>>>> >      AllocaInst *Slot = cast<AllocaInst>(I.getArgOperand(1));
>>>> >
>>>> >      int FI = FuncInfo.StaticAllocaMap[Slot];
>>>> > @@ -5230,7 +5267,7 @@ SelectionDAGBuilder::visitIntrinsicCall(const
>>>> CallInst &I, unsigned Intrinsic) {
>>>> >      SDValue FIN = DAG.getFrameIndex(FI, PtrTy);
>>>> >
>>>> >      // Store the stack protector onto the stack.
>>>> > -    Res = DAG.getStore(getRoot(), sdl, Src, FIN,
>>>> > +    Res = DAG.getStore(Chain, sdl, Src, FIN,
>>>> >                         MachinePointerInfo::getFixedStack(FI),
>>>> >                         true, false, 0);
>>>> >      setValue(&I, Res);
>>>> > diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>> b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>> > index 84679f9..cf0846f 100644
>>>> > --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>> > +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>>>> > @@ -397,7 +397,8 @@ private:
>>>> >    class StackProtectorDescriptor {
>>>> >    public:
>>>> >      StackProtectorDescriptor() : ParentMBB(nullptr),
>>>> SuccessMBB(nullptr),
>>>> > -                                 FailureMBB(nullptr), Guard(nullptr)
>>>> { }
>>>> > +                                 FailureMBB(nullptr), Guard(nullptr),
>>>> > +                                 GuardReg(0) { }
>>>> >      ~StackProtectorDescriptor() { }
>>>> >
>>>> >      /// Returns true if all fields of the stack protector descriptor
>>>> are
>>>> > @@ -455,6 +456,9 @@ private:
>>>> >      MachineBasicBlock *getFailureMBB() { return FailureMBB; }
>>>> >      const Value *getGuard() { return Guard; }
>>>> >
>>>> > +    unsigned getGuardReg() const { return GuardReg; }
>>>> > +    void setGuardReg(unsigned R) { GuardReg = R; }
>>>> > +
>>>> >    private:
>>>> >      /// The basic block for which we are generating the stack
>>>> protector.
>>>> >      ///
>>>> > @@ -477,6 +481,9 @@ private:
>>>> >      /// stack protector stack slot.
>>>> >      const Value *Guard;
>>>> >
>>>> > +    /// The virtual register holding the stack guard value.
>>>> > +    unsigned GuardReg;
>>>> > +
>>>> >      /// Add a successor machine basic block to ParentMBB. If the
>>>> successor mbb
>>>> >      /// has not been created yet (i.e. if SuccMBB = 0), then the
>>>> machine basic
>>>> >      /// block will be created.
>>>> > diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>> b/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>> > index 07ff093..97bec0d 100644
>>>> > --- a/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>> > +++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
>>>> > @@ -7914,6 +7914,10 @@ bool
>>>> AArch64TargetLowering::shouldExpandAtomicInIR(Instruction *Inst) const {
>>>> >    return Inst->getType()->getPrimitiveSizeInBits() <= 128;
>>>> >  }
>>>> >
>>>> > +bool AArch64TargetLowering::emitLoadStackGuardNode() const {
>>>> > +  return Subtarget->isTargetMachO();
>>>> > +}
>>>> > +
>>>> >  TargetLoweringBase::LegalizeTypeAction
>>>> >  AArch64TargetLowering::getPreferredVectorAction(EVT VT) const {
>>>> >    MVT SVT = VT.getSimpleVT();
>>>> > diff --git a/lib/Target/AArch64/AArch64ISelLowering.h
>>>> b/lib/Target/AArch64/AArch64ISelLowering.h
>>>> > index cb0b9ef..7500074 100644
>>>> > --- a/lib/Target/AArch64/AArch64ISelLowering.h
>>>> > +++ b/lib/Target/AArch64/AArch64ISelLowering.h
>>>> > @@ -324,6 +324,7 @@ public:
>>>> >
>>>> >    bool shouldExpandAtomicInIR(Instruction *Inst) const override;
>>>> >
>>>> > +  bool emitLoadStackGuardNode() const override;
>>>> >    TargetLoweringBase::LegalizeTypeAction
>>>> >    getPreferredVectorAction(EVT VT) const override;
>>>> > diff --git a/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>> b/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>> > index ce85b2c..bfb32d6 100644
>>>> > --- a/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>> > +++ b/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>> > @@ -848,6 +848,29 @@ bool AArch64InstrInfo::optimizeCompareInstr(
>>>> >    return true;
>>>> >  }
>>>> >
>>>> > +bool
>>>> > +AArch64InstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI)
>>>> const {
>>>> > +  if (MI->getOpcode() != TargetOpcode::LOAD_STACK_GUARD)
>>>> > +    return false;
>>>> > +
>>>> > +  assert(Subtarget.isTargetMachO() &&
>>>> > +         "LOAD_STACK_GUARD currently supported only for MachO.”);
>>>>
>>>> At least for AArch64, this expansion doesn’t look very specific to
>>>> MachO. How much harder would it be to make it general? Or are you just
>>>> being cautious here because you can’t test the other platforms?
>>>>
>>>> > +
>>>> > +  MachineBasicBlock &MBB = *MI->getParent();
>>>> > +  DebugLoc DL = MI->getDebugLoc();
>>>> > +  unsigned Reg = MI->getOperand(0).getReg();
>>>> > +  const GlobalValue *GV =
>>>> > +      cast<GlobalValue>((*MI->memoperands_begin())->getValue());
>>>> > +  MachineOperand GA = MachineOperand::CreateGA(GV, 0,
>>>> AArch64II::MO_GOT);
>>>>
>>>> I noticed that you’re using MachineOperand::CreateGA throughout this
>>>> patch, but almost everywhere else we use MIB’s addGlobalAddress method
>>>> instead. I don’t see much difference but unless there’s some reason to
>>>> prefer direct access to CreateGA, it would be better to follow the usual
>>>> convention.
>>>>
>>>> > +  BuildMI(MBB, MI, DL, get(AArch64::LOADgot), Reg).addOperand(GA);
>>>> > +  BuildMI(MBB, MI, DL, get(AArch64::LDRXui), Reg)
>>>> > +      .addReg(Reg, RegState::Kill).addImm(0)
>>>> > +      .addMemOperand(*MI->memoperands_begin());
>>>> > +  MBB.erase(MI);
>>>> > +
>>>> > +  return true;
>>>> > +}
>>>> > +
>>>> >  /// Return true if this is this instruction has a non-zero immediate
>>>> >  bool AArch64InstrInfo::hasShiftedReg(const MachineInstr *MI) const {
>>>> >    switch (MI->getOpcode()) {
>>>> >
>>>>
>>>>
>>>> The rest of this looks fine.
>>>
>>>
>>>
>> <stack_guard3.patch>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140725/4b45d9e2/attachment.html>


More information about the llvm-commits mailing list