[llvm] r177666 - Implement builtin_{setjmp/longjmp} on PPC

Hal Finkel hfinkel at anl.gov
Mon Mar 25 21:54:58 PDT 2013


----- Original Message -----
> From: "Bill Schmidt" <wschmidt at linux.vnet.ibm.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-commits at cs.uiuc.edu
> Sent: Monday, March 25, 2013 9:17:07 PM
> Subject: Re: [llvm] r177666 - Implement builtin_{setjmp/longjmp} on PPC
> 
> Hi Hal,
> 
> Sorry it took me so long to get to this.  I have a few comments on
> this
> one, embedded below:
> 
> On Thu, 2013-03-21 at 21:37 +0000, Hal Finkel wrote:
> > Author: hfinkel
> > Date: Thu Mar 21 16:37:52 2013
> > New Revision: 177666
> > 
> > URL: http://llvm.org/viewvc/llvm-project?rev=177666&view=rev
> > Log:
> > Implement builtin_{setjmp/longjmp} on PPC
> > 
> > This implements SJLJ lowering on PPC, making the Clang functions
> > __builtin_{setjmp/longjmp} functional on PPC platforms. The
> > implementation
> > strategy is similar to that on X86, with the exception that a
> > branch-and-link
> > variant is used to get the right jump address. Credit goes to Bill
> > Schmidt for
> > suggesting the use of the unconditional bcl form (instead of the
> > regular bl
> > instruction) to limit return-address-cache pollution.
> > 
> > Benchmarking the speed at -O3 of:
> > 
> > static jmp_buf env_sigill;
> > 
> > void foo() {
> >                 __builtin_longjmp(env_sigill,1);
> > }
> > 
> > main() {
> > 	...
> > 
> >         for (int i = 0; i < c; ++i) {
> >                 if (__builtin_setjmp(env_sigill)) {
> >                         goto done;
> >                 } else {
> >                         foo();
> >                 }
> > 
> > done:;
> >         }
> > 
> > 	...
> > }
> > 
> > vs. the same code using the libc setjmp/longjmp functions on a P7
> > shows that
> > this builtin implementation is ~4x faster with Altivec enabled and
> > ~7.25x
> > faster with Altivec disabled. This comparison is somewhat unfair
> > because the
> > libc version must also save/restore the VSX registers which we
> > don't yet
> > support.
> > 
> > Added:
> >     llvm/trunk/test/CodeGen/PowerPC/sjlj.ll
> > Modified:
> >     llvm/trunk/lib/Target/PowerPC/PPCCallingConv.td
> >     llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
> >     llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> >     llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
> >     llvm/trunk/lib/Target/PowerPC/PPCInstr64Bit.td
> >     llvm/trunk/lib/Target/PowerPC/PPCInstrFormats.td
> >     llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td
> >     llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.cpp
> >     llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.h
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCCallingConv.td
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCCallingConv.td?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCCallingConv.td (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCCallingConv.td Thu Mar 21
> > 16:37:52 2013
> > @@ -136,3 +136,8 @@ def CSR_SVR464   : CalleeSavedRegs<(add
> >                                          F27, F28, F29, F30, F31,
> >                                          CR2, CR3, CR4,
> >                                          V20, V21, V22, V23, V24,
> >                                          V25, V26, V27,
> >                                          V28, V29, V30, V31)>;
> > +
> > +def CSR_NoRegs : CalleeSavedRegs<(add)>;
> > +
> > +def CSR_NoRegs_Altivec : CalleeSavedRegs<(add (sequence "V%u", 0,
> > 31), VRSAVE)>;
> > +
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp Thu Mar 21
> > 16:37:52 2013
> > @@ -164,6 +164,12 @@ namespace {
> >        return PPCLowering.SelectAddressRegImmShift(N, Disp, Base,
> >        *CurDAG);
> >      }
> > 
> > +    // Select an address into a single register.
> > +    bool SelectAddr(SDValue N, SDValue &Base) {
> > +      Base = N;
> > +      return true;
> > +    }
> > +
> >      /// SelectInlineAsmMemoryOperand - Implement addressing mode
> >      selection for
> >      /// inline asm expressions.  It is always correct to compute
> >      the value into
> >      /// a register.  The case of adding a (possibly relocatable)
> >      constant to a
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp Thu Mar 21
> > 16:37:52 2013
> > @@ -70,6 +70,7 @@ static TargetLoweringObjectFile *CreateT
> >  PPCTargetLowering::PPCTargetLowering(PPCTargetMachine &TM)
> >    : TargetLowering(TM, CreateTLOF(TM)),
> >    PPCSubTarget(*TM.getSubtargetImpl()) {
> >    const PPCSubtarget *Subtarget =
> >    &TM.getSubtarget<PPCSubtarget>();
> > +  PPCRegInfo = TM.getRegisterInfo();
> > 
> >    setPow2DivIsCheap();
> > 
> > @@ -211,6 +212,14 @@ PPCTargetLowering::PPCTargetLowering(PPC
> >    setOperationAction(ISD::EXCEPTIONADDR, MVT::i32, Expand);
> >    setOperationAction(ISD::EHSELECTION,   MVT::i32, Expand);
> > 
> > +  // NOTE: EH_SJLJ_SETJMP/_LONGJMP supported here is NOT intened
> > to support
> 
> typo: intended.  (Thanks for the good explanatory note here.)

I should start off by pointing out that this entire implementation started out as a copy-and-paste from the X86 backend (including this comment, so I can't really take credit for its good quality). Looking through your comments, there are still a number of places where the comments no longer match the code; I'll clean them up.

> 
> > +  // SjLj exception handling but a light-weight setjmp/longjmp
> > replacement to
> > +  // support continuation, user-level threading, and etc.. As a
> > result, no
> > +  // other SjLj exception interfaces are implemented and please
> > don't build
> > +  // your own exception handling based on them.
> > +  // LLVM/Clang supports zero-cost DWARF exception handling.
> > +  setOperationAction(ISD::EH_SJLJ_SETJMP, MVT::i32, Custom);
> > +  setOperationAction(ISD::EH_SJLJ_LONGJMP, MVT::Other, Custom);
> > 
> >    // We want to legalize GlobalAddress and ConstantPool nodes into
> >    the
> >    // appropriate instructions to materialize the address.
> > @@ -567,6 +576,8 @@ const char *PPCTargetLowering::getTarget
> >    case PPCISD::BCTRL_Darwin:    return "PPCISD::BCTRL_Darwin";
> >    case PPCISD::BCTRL_SVR4:      return "PPCISD::BCTRL_SVR4";
> >    case PPCISD::RET_FLAG:        return "PPCISD::RET_FLAG";
> > +  case PPCISD::EH_SJLJ_SETJMP:  return "PPCISD::EH_SJLJ_SETJMP";
> > +  case PPCISD::EH_SJLJ_LONGJMP: return "PPCISD::EH_SJLJ_LONGJMP";
> >    case PPCISD::MFCR:            return "PPCISD::MFCR";
> >    case PPCISD::VCMP:            return "PPCISD::VCMP";
> >    case PPCISD::VCMPo:           return "PPCISD::VCMPo";
> > @@ -4561,6 +4572,21 @@ SDValue PPCTargetLowering::LowerDYNAMIC_
> >    return DAG.getNode(PPCISD::DYNALLOC, dl, VTs, Ops, 3);
> >  }
> > 
> > +SDValue PPCTargetLowering::lowerEH_SJLJ_SETJMP(SDValue Op,
> > +                                               SelectionDAG &DAG)
> > const {
> > +  DebugLoc DL = Op.getDebugLoc();
> > +  return DAG.getNode(PPCISD::EH_SJLJ_SETJMP, DL,
> > +                     DAG.getVTList(MVT::i32, MVT::Other),
> > +                     Op.getOperand(0), Op.getOperand(1));
> > +}
> > +
> > +SDValue PPCTargetLowering::lowerEH_SJLJ_LONGJMP(SDValue Op,
> > +                                                SelectionDAG &DAG)
> > const {
> > +  DebugLoc DL = Op.getDebugLoc();
> > +  return DAG.getNode(PPCISD::EH_SJLJ_LONGJMP, DL, MVT::Other,
> > +                     Op.getOperand(0), Op.getOperand(1));
> > +}
> > +
> >  /// LowerSELECT_CC - Lower floating point select_cc's into fsel
> >  instruction when
> >  /// possible.
> >  SDValue PPCTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG
> >  &DAG) const {
> > @@ -5557,6 +5583,9 @@ SDValue PPCTargetLowering::LowerOperatio
> >    case ISD::DYNAMIC_STACKALLOC:
> >      return LowerDYNAMIC_STACKALLOC(Op, DAG, PPCSubTarget);
> > 
> > +  case ISD::EH_SJLJ_SETJMP:     return lowerEH_SJLJ_SETJMP(Op,
> > DAG);
> > +  case ISD::EH_SJLJ_LONGJMP:    return lowerEH_SJLJ_LONGJMP(Op,
> > DAG);
> > +
> >    case ISD::SELECT_CC:          return LowerSELECT_CC(Op, DAG);
> >    case ISD::FP_TO_UINT:
> >    case ISD::FP_TO_SINT:         return LowerFP_TO_INT(Op, DAG,
> > @@ -5869,9 +5898,238 @@ PPCTargetLowering::EmitPartwordAtomicBin
> >    return BB;
> >  }
> > 
> > +llvm::MachineBasicBlock*
> > +PPCTargetLowering::emitEHSjLjSetJmp(MachineInstr *MI,
> > +                                    MachineBasicBlock *MBB) const
> > {
> > +  DebugLoc DL = MI->getDebugLoc();
> > +  const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
> > +
> > +  MachineFunction *MF = MBB->getParent();
> > +  MachineRegisterInfo &MRI = MF->getRegInfo();
> > +
> > +  const BasicBlock *BB = MBB->getBasicBlock();
> > +  MachineFunction::iterator I = MBB;
> > +  ++I;
> > +
> > +  // Memory Reference
> > +  MachineInstr::mmo_iterator MMOBegin = MI->memoperands_begin();
> > +  MachineInstr::mmo_iterator MMOEnd = MI->memoperands_end();
> > +
> > +  unsigned DstReg = MI->getOperand(0).getReg();
> > +  const TargetRegisterClass *RC = MRI.getRegClass(DstReg);
> > +  assert(RC->hasType(MVT::i32) && "Invalid destination!");
> > +  unsigned mainDstReg = MRI.createVirtualRegister(RC);
> > +  unsigned restoreDstReg = MRI.createVirtualRegister(RC);
> > +
> > +  MVT PVT = getPointerTy();
> > +  assert((PVT == MVT::i64 || PVT == MVT::i32) &&
> > +         "Invalid Pointer Size!");
> > +  // For v = setjmp(buf), we generate
> > +  //
> > +  // thisMBB:
> > +  //  SjLjSetup mainMBB
> > +  //  bl mainMBB
> > +  //  v_restore = 1
> > +  //  b sinkMBB
> 
> This comment does not match the order in which you generate the code,
> and in which you check for it in the test case.  More on this
> later...
> 
> > +  //
> > +  // mainMBB:
> > +  //  buf[LabelOffset] = LR
> > +  //  v_main = 0
> > +  //
> > +  // sinkMBB:
> > +  //  v = phi(main, restore)
> > +  //
> > +
> > +  MachineBasicBlock *thisMBB = MBB;
> > +  MachineBasicBlock *mainMBB = MF->CreateMachineBasicBlock(BB);
> > +  MachineBasicBlock *sinkMBB = MF->CreateMachineBasicBlock(BB);
> > +  MF->insert(I, mainMBB);
> > +  MF->insert(I, sinkMBB);
> > +
> > +  MachineInstrBuilder MIB;
> > +
> > +  // Transfer the remainder of BB and its successor edges to
> > sinkMBB.
> > +  sinkMBB->splice(sinkMBB->begin(), MBB,
> > +                  llvm::next(MachineBasicBlock::iterator(MI)),
> > MBB->end());
> > +  sinkMBB->transferSuccessorsAndUpdatePHIs(MBB);
> > +
> > +  // Note that the structure of the jmp_buf used here is not
> > compatible
> > +  // with that used by libc, and is not designed to be.
> > Specifically, it
> > +  // stores only those 'reserved' registers that LLVM does not
> > otherwise
> > +  // understand how to spill. Also, by convention, by the time
> > this
> > +  // intrinsic is called, Clang has already stored the frame
> > address in the
> > +  // first slot of the buffer and stack address in the third.
> > Following the
> > +  // X86 target code, we'll store the jump address in the second
> > slot. We also
> > +  // need to save the TOC pointer (R2) to handle jumps between
> > shared
> > +  // libraries, and that will be stored in the fourth slot. The
> > thread
> > +  // identifier (R13) is not affected.
> > +
> > +  // thisMBB:
> > +  const int64_t LabelOffset = 1 * PVT.getStoreSize();
> > +  const int64_t TOCOffset   = 3 * PVT.getStoreSize();
> > +
> > +  // Prepare IP either in reg.
> 
> Sorry, but I can't parse that comment. ;)
> 
> > +  const TargetRegisterClass *PtrRC = getRegClassFor(PVT);
> > +  unsigned LabelReg = MRI.createVirtualRegister(PtrRC);
> > +  unsigned BufReg = MI->getOperand(1).getReg();
> > +
> > +  if (PPCSubTarget.isPPC64() && PPCSubTarget.isSVR4ABI()) {
> > +    MIB = BuildMI(*thisMBB, MI, DL, TII->get(PPC::STD))
> > +            .addReg(PPC::X2)
> > +            .addImm(TOCOffset / 4)
> > +            .addReg(BufReg);
> > +
> > +    MIB.setMemRefs(MMOBegin, MMOEnd);
> > +  }
> > +
> > +  // Setup
> > +  MIB = BuildMI(*thisMBB, MI, DL,
> > TII->get(PPC::BCL)).addMBB(mainMBB);
> 
> Your use of the "bcl" isn't actually the one that avoids polluting
> the
> link stack.  Sorry for any confusion on this point during our IRC
> discussion.  The only form of this instruction that does this is:
> 
>   bcl 20,31,$+4
> 
> which is treated as a not-taken branch rather than a subroutine call.
> If you use any other address than $+4, the processor implementation
> is
> likely to treat it as a subroutine call and modify the link stack.
> 
> The best reference on this subject I've seen so far is the Freescale
> document that Gabor Greif pointed me to the other day:
> 
>   http://www.freescale.com/files/32bit/doc/app_note/AN2203.pdf#page=23
> 
> Please have a look at section 4.5 for the basic description, and then
> the example in 4.5.2 for how to generate code that branches to a
> nearby
> address without polluting the link stack.  As I was describing
> (poorly)
> on IRC, you need to do arithmetic on the next-address retrieved from
> the
> link register.
>
>  A complete example is shown there which I think you
> will
> find very helpful.

I'll look at it, thanks!

> 
> > +  MIB.addRegMask(PPCRegInfo->getNoPreservedMask());
> 
> I'm too thick to understand this without some more explanation.  What
> appears to be happening here is that you're putting a marker in the
> instruction stream that says, "Spill and reload all callee-save
> registers (that you've modified locally) around this marker."  Is
> that
> the intent, with the register allocator automatically inserting the
> necessary spill code?

Yes, that it what happens.

> 
> With the tentative assumption that that's so, the placement of
> SjLjSetup
> following the jump to mainMBB perplexes me. 

This is somewhat confusing because of how it was derived from the X86 code and my insufficient use of renaming. In the X86 code, the register mask is actually applied to the SjLjSetup pseudo-instruction (which justifies its name). I've kept the name, but apply the mask to the bcl instruction (as it must be). In our case, the only real purpose of the SjLjSetup instruction is to make LLVM understand that the block pointed to by the SjLjSetup is not dead (because LLVM thinks that SjLjSetup is some kind of conditional branch).

On the other hand, if we go to using the bcl, ..., $+4 form along with arithmetic, which would be very close to what X86 does, than we might also apply the register mask to the SjLjSetup and it would again deserve its name.

> This code won't be
> executed
> until after a longjmp returns, so the callee-saved registers won't be
> saved in advance and will be possibly overwritten between "now" and
> the
> longjmp execution.  It seems to me that the location you specified in
> your earlier comment is the correct place to force the state to be
> saved.  However, if SjLjSetup is placed there, then on return from
> the
> longjmp there doesn't seem to be anything that will cause the state
> to
> be restored.  From our earlier discussions I got the impression you
> wanted all the save/restore to be done in the setjmp code, but I
> don't
> understand how what you have here accomplishes that.  No doubt I'm
> just
> not seeing the whole picture.
> 
> In any event, this needs Extreme Commentary ;) to allow others to
> understand and maintain the code...

:) Indeed.

> 
> > +
> > +  BuildMI(*thisMBB, MI, DL, TII->get(PPC::LI),
> > restoreDstReg).addImm(1);
> > +
> > +  MIB = BuildMI(*thisMBB, MI, DL, TII->get(PPC::EH_SjLj_Setup))
> > +          .addMBB(mainMBB);
> > +  MIB = BuildMI(*thisMBB, MI, DL,
> > TII->get(PPC::B)).addMBB(sinkMBB);
> > +
> > +  thisMBB->addSuccessor(mainMBB, /* weight */ 0);
> > +  thisMBB->addSuccessor(sinkMBB, /* weight */ 1);
> > +
> > +  // mainMBB:
> > +  //  mainDstReg = 0
> 
> The above comment belongs after the MFLR and IP store code...
> 
> > +  MIB = BuildMI(mainMBB, DL,
> > +    TII->get(PPCSubTarget.isPPC64() ? PPC::MFLR8 : PPC::MFLR),
> > LabelReg);
> > +
> > +  // Store IP
> > +  if (PPCSubTarget.isPPC64()) {
> > +    MIB = BuildMI(mainMBB, DL, TII->get(PPC::STD))
> > +            .addReg(LabelReg)
> > +            .addImm(LabelOffset / 4)
> > +            .addReg(BufReg);
> > +  } else {
> > +    MIB = BuildMI(mainMBB, DL, TII->get(PPC::STW))
> > +            .addReg(LabelReg)
> > +            .addImm(LabelOffset)
> > +            .addReg(BufReg);
> > +  }
> > +
> > +  MIB.setMemRefs(MMOBegin, MMOEnd);
> 
> ...here.
> 
> > +
> > +  BuildMI(mainMBB, DL, TII->get(PPC::LI), mainDstReg).addImm(0);
> > +  mainMBB->addSuccessor(sinkMBB);
> > +
> > +  // sinkMBB:
> > +  BuildMI(*sinkMBB, sinkMBB->begin(), DL,
> > +          TII->get(PPC::PHI), DstReg)
> > +    .addReg(mainDstReg).addMBB(mainMBB)
> > +    .addReg(restoreDstReg).addMBB(thisMBB);
> > +
> > +  MI->eraseFromParent();
> > +  return sinkMBB;
> > +}
> > +
> > +MachineBasicBlock *
> > +PPCTargetLowering::emitEHSjLjLongJmp(MachineInstr *MI,
> > +                                     MachineBasicBlock *MBB) const
> > {
> > +  DebugLoc DL = MI->getDebugLoc();
> > +  const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
> > +
> > +  MachineFunction *MF = MBB->getParent();
> > +  MachineRegisterInfo &MRI = MF->getRegInfo();
> > +
> > +  // Memory Reference
> > +  MachineInstr::mmo_iterator MMOBegin = MI->memoperands_begin();
> > +  MachineInstr::mmo_iterator MMOEnd = MI->memoperands_end();
> > +
> > +  MVT PVT = getPointerTy();
> > +  assert((PVT == MVT::i64 || PVT == MVT::i32) &&
> > +         "Invalid Pointer Size!");
> > +
> > +  const TargetRegisterClass *RC =
> > +    (PVT == MVT::i64) ? &PPC::G8RCRegClass : &PPC::GPRCRegClass;
> > +  unsigned Tmp = MRI.createVirtualRegister(RC);
> > +  // Since FP is only updated here but NOT referenced, it's
> > treated as GPR.
> 
> Sorry, I don't comprehend this comment either.  Can you elaborate?

I don't either ;) It was copied from the X86 code and I'll delete it.

> 
> > +  unsigned FP  = (PVT == MVT::i64) ? PPC::X31 : PPC::R31;
> > +  unsigned SP  = (PVT == MVT::i64) ? PPC::X1 : PPC::R1;
> > +
> > +  MachineInstrBuilder MIB;
> > +
> > +  const int64_t LabelOffset = 1 * PVT.getStoreSize();
> > +  const int64_t SPOffset    = 2 * PVT.getStoreSize();
> > +  const int64_t TOCOffset   = 3 * PVT.getStoreSize();
> > +
> > +  unsigned BufReg = MI->getOperand(0).getReg();
> > +
> > +  // Reload FP (the jumped-to function may not have had a
> > +  // frame pointer, and if so, then its r31 will be restored
> > +  // as necessary).
> > +  if (PVT == MVT::i64) {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LD), FP)
> > +            .addImm(0)
> > +            .addReg(BufReg);
> > +  } else {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LWZ), FP)
> > +            .addImm(0)
> > +            .addReg(BufReg);
> > +  }
> > +  MIB.setMemRefs(MMOBegin, MMOEnd);
> > +
> > +  // Reload IP
> > +  if (PVT == MVT::i64) {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LD), Tmp)
> > +            .addImm(LabelOffset / 4)
> > +            .addReg(BufReg);
> > +  } else {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LWZ), Tmp)
> > +            .addImm(LabelOffset)
> > +            .addReg(BufReg);
> > +  }
> > +  MIB.setMemRefs(MMOBegin, MMOEnd);
> > +
> > +  // Reload SP
> > +  if (PVT == MVT::i64) {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LD), SP)
> > +            .addImm(SPOffset / 4)
> > +            .addReg(BufReg);
> > +  } else {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LWZ), SP)
> > +            .addImm(SPOffset)
> > +            .addReg(BufReg);
> > +  }
> > +  MIB.setMemRefs(MMOBegin, MMOEnd);
> > +
> > +  // FIXME: When we also support base pointers, that register must
> > also be
> > +  // restored here.
> 
> Once again, I don't understand.  I'm starting to feel pretty dumb. ;)
> To what sort of base pointers are you referring, and what register is
> this?

When I add code to support dynamic stack-frame realignment, we might end up with three frame-related registers: a base pointer (from which we pull things with defined offsets relative to the callers stack frame), a frame pointer (which we might need if we have dynamic stack allocations), and the stack pointer. And in the case where we have all three, we need to make sure that the base pointer is also saved and restored properly.

> 
> > +
> > +  // Reload TOC
> > +  if (PVT == MVT::i64 && PPCSubTarget.isSVR4ABI()) {
> > +    MIB = BuildMI(*MBB, MI, DL, TII->get(PPC::LD), PPC::X2)
> > +            .addImm(TOCOffset / 4)
> > +            .addReg(BufReg);
> > +
> > +    MIB.setMemRefs(MMOBegin, MMOEnd);
> > +  }
> > +
> > +  // Jump
> > +  BuildMI(*MBB, MI, DL,
> > +          TII->get(PVT == MVT::i64 ? PPC::MTCTR8 :
> > PPC::MTCTR)).addReg(Tmp);
> > +  BuildMI(*MBB, MI, DL, TII->get(PVT == MVT::i64 ? PPC::BCTR8 :
> > PPC::BCTR));
> > +
> > +  MI->eraseFromParent();
> > +  return MBB;
> > +}
> > +
> >  MachineBasicBlock *
> >  PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
> >                                                 MachineBasicBlock
> >                                                 *BB) const {
> > +  if (MI->getOpcode() == PPC::EH_SjLj_SetJmp32 ||
> > +      MI->getOpcode() == PPC::EH_SjLj_SetJmp64) {
> > +    return emitEHSjLjSetJmp(MI, BB);
> > +  } else if (MI->getOpcode() == PPC::EH_SjLj_LongJmp32 ||
> > +             MI->getOpcode() == PPC::EH_SjLj_LongJmp64) {
> > +    return emitEHSjLjLongJmp(MI, BB);
> > +  }
> > +
> >    const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
> > 
> >    // To "insert" these instructions we actually have to insert
> >    their
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h Thu Mar 21
> > 16:37:52 2013
> > @@ -16,6 +16,7 @@
> >  #define LLVM_TARGET_POWERPC_PPC32ISELLOWERING_H
> > 
> >  #include "PPC.h"
> > +#include "PPCRegisterInfo.h"
> >  #include "PPCSubtarget.h"
> >  #include "llvm/CodeGen/SelectionDAG.h"
> >  #include "llvm/Target/TargetLowering.h"
> > @@ -119,6 +120,12 @@ namespace llvm {
> >        /// are undefined.
> >        MFCR,
> > 
> > +      // EH_SJLJ_SETJMP - SjLj exception handling setjmp.
> > +      EH_SJLJ_SETJMP,
> > +
> > +      // EH_SJLJ_LONGJMP - SjLj exception handling longjmp.
> > +      EH_SJLJ_LONGJMP,
> > +
> >        /// RESVEC = VCMP(LHS, RHS, OPC) - Represents one of the
> >        altivec VCMP*
> >        /// instructions.  For lack of better number, we use the
> >        opcode number
> >        /// encoding for the OPC field to identify the compare.  For
> >        example, 838
> > @@ -321,6 +328,7 @@ namespace llvm {
> > 
> >    class PPCTargetLowering : public TargetLowering {
> >      const PPCSubtarget &PPCSubTarget;
> > +    const PPCRegisterInfo *PPCRegInfo;
> > 
> >    public:
> >      explicit PPCTargetLowering(PPCTargetMachine &TM);
> > @@ -395,6 +403,12 @@ namespace llvm {
> >                                                  MachineBasicBlock
> >                                                  *MBB,
> >                                              bool is8bit, unsigned
> >                                              Opcode) const;
> > 
> > +    MachineBasicBlock *emitEHSjLjSetJmp(MachineInstr *MI,
> > +                                        MachineBasicBlock *MBB)
> > const;
> > +
> > +    MachineBasicBlock *emitEHSjLjLongJmp(MachineInstr *MI,
> > +                                         MachineBasicBlock *MBB)
> > const;
> > +
> >      ConstraintType getConstraintType(const std::string
> >      &Constraint) const;
> > 
> >      /// Examine constraint string and operand type and determine a
> >      weight value.
> > @@ -608,6 +622,9 @@ namespace llvm {
> >                       const SmallVectorImpl<ISD::InputArg> &Ins,
> >                       DebugLoc dl, SelectionDAG &DAG,
> >                       SmallVectorImpl<SDValue> &InVals) const;
> > +
> > +    SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG)
> > const;
> > +    SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG)
> > const;
> >    };
> >  }
> > 
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCInstr64Bit.td
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCInstr64Bit.td?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCInstr64Bit.td (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCInstr64Bit.td Thu Mar 21
> > 16:37:52 2013
> > @@ -273,6 +273,19 @@ def MFCR8 : XFXForm_3<31, 19, (outs G8RC
> >                       "mfcr $rT", SprMFCR>,
> >                       PPC970_MicroCode, PPC970_Unit_CRU;
> > 
> > +let hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1,
> > +    usesCustomInserter = 1 in {
> > +  def EH_SjLj_SetJmp64  : Pseudo<(outs GPRC:$dst), (ins
> > memr:$buf),
> > +                            "#EH_SJLJ_SETJMP64",
> > +                            [(set GPRC:$dst, (PPCeh_sjlj_setjmp
> > addr:$buf))]>,
> > +                          Requires<[In64BitMode]>;
> > +  let isTerminator = 1 in
> > +  def EH_SjLj_LongJmp64 : Pseudo<(outs), (ins memr:$buf),
> > +                            "#EH_SJLJ_LONGJMP64",
> > +                            [(PPCeh_sjlj_longjmp addr:$buf)]>,
> > +                          Requires<[In64BitMode]>;
> > +}
> > +
> >  //===----------------------------------------------------------------------===//
> >  // 64-bit SPR manipulation instrs.
> > 
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCInstrFormats.td
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCInstrFormats.td?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCInstrFormats.td (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCInstrFormats.td Thu Mar 21
> > 16:37:52 2013
> > @@ -120,6 +120,18 @@ class BForm_1<bits<6> opcode, bits<5> bo
> >    let CR = 0;
> >  }
> > 
> > +class BForm_2<bits<6> opcode, bits<5> bo, bits<5> bi, bit aa, bit
> > lk,
> > +              dag OOL, dag IOL, string asmstr>
> > +  : I<opcode, OOL, IOL, asmstr, BrB> {
> > +  bits<14> BD;
> > +
> > +  let Inst{6-10}  = bo;
> > +  let Inst{11-15} = bi;
> > +  let Inst{16-29} = BD;
> > +  let Inst{30}    = aa;
> > +  let Inst{31}    = lk;
> > +}
> > +
> >  // 1.7.4 D-Form
> >  class DForm_base<bits<6> opcode, dag OOL, dag IOL, string asmstr,
> >                   InstrItinClass itin, list<dag> pattern>
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td Thu Mar 21
> > 16:37:52 2013
> > @@ -158,6 +158,14 @@ def retflag       : SDNode<"PPCISD::RET_
> >  def PPCtc_return : SDNode<"PPCISD::TC_RETURN", SDT_PPCTC_ret,
> >                          [SDNPHasChain,  SDNPOptInGlue,
> >                          SDNPVariadic]>;
> > 
> > +def PPCeh_sjlj_setjmp  : SDNode<"PPCISD::EH_SJLJ_SETJMP",
> > +                                SDTypeProfile<1, 1, [SDTCisInt<0>,
> > +
> >                                                     SDTCisPtrTy<1>]>,
> > +                                [SDNPHasChain, SDNPSideEffect]>;
> > +def PPCeh_sjlj_longjmp : SDNode<"PPCISD::EH_SJLJ_LONGJMP",
> > +                                SDTypeProfile<0, 1,
> > [SDTCisPtrTy<0>]>,
> > +                                [SDNPHasChain, SDNPSideEffect]>;
> > +
> >  def PPCvcmp       : SDNode<"PPCISD::VCMP" , SDT_PPCvcmp, []>;
> >  def PPCvcmp_o     : SDNode<"PPCISD::VCMPo", SDT_PPCvcmp,
> >  [SDNPOutGlue]>;
> > 
> > @@ -391,6 +399,12 @@ def memrix : Operand<iPTR> {   // memri
> >    let EncoderMethod = "getMemRIXEncoding";
> >  }
> > 
> > +// A single-register address. This is used with the SjLj
> > +// pseudo-instructions.
> > +def memr : Operand<iPTR> {
> > +  let MIOperandInfo = (ops ptr_rc:$ptrreg);
> > +}
> > +
> >  // PowerPC Predicate operand.  20 = (0<<5)|20 = always, CR0 is a
> >  dummy reg
> >  // that doesn't matter.
> >  def pred : PredicateOperand<OtherVT, (ops imm, CRRC),
> > @@ -404,6 +418,10 @@ def xaddr  : ComplexPattern<iPTR, 2, "Se
> >  def xoaddr : ComplexPattern<iPTR, 2, "SelectAddrIdxOnly",[], []>;
> >  def ixaddr : ComplexPattern<iPTR, 2, "SelectAddrImmShift", [],
> >  []>; // "std"
> > 
> > +// The address in a single register. This is used with the SjLj
> > +// pseudo-instructions.
> > +def addr   : ComplexPattern<iPTR, 1, "SelectAddr",[], []>;
> > +
> >  /// This is just the offset part of iaddr, used for preinc.
> >  def iaddroff : ComplexPattern<iPTR, 1, "SelectAddrImmOffs", [],
> >  []>;
> >  def xaddroff : ComplexPattern<iPTR, 1, "SelectAddrIdxOffs", [],
> >  []>;
> > @@ -505,6 +523,14 @@ let isBranch = 1, isTerminator = 1, hasC
> >    }
> >  }
> > 
> > +// The direct BCL used by the SjLj setjmp code.
> > +let isCall = 1, hasCtrlDep = 1, PPC970_Unit = 7 in {
> > +  let Defs = [LR], Uses = [RM] in {
> > +    def BCL  : BForm_2<16, 20, 31, 0, 1, (outs), (ins
> > condbrtarget:$dst),
> > +                       "bcl 20, 31, $dst">;
> > +  }
> > +}
> > +
> 
> I don't think this should be so specific to the particular use.
>  Please
> make the "20" and "31" into variables that will be bound to the BO
> and
> BI fields so we have a single BCL instruction for general use.
>  Ulrich
> has pointed out that we need a single form of each instruction for
> the
> assembly parser and disassembler to work best, and the above would
> encourage somebody needing a different form of BCL to create their
> own
> version.  I'd rather just see the fully flexible version done up
> front.
> We need it anyway for optimizing conditional branches that target
> calls
> to become conditional calls.

Yep, already on my TODO list :)

> 
> >  // Darwin ABI Calls.
> >  let isCall = 1, PPC970_Unit = 7, Defs = [LR] in {
> >    // Convenient aliases for call instructions
> > @@ -583,6 +609,23 @@ def TAILBA   : IForm<18, 0, 0, (outs), (
> >                    "ba $dst", BrB,
> >                    []>;
> > 
> > +let hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1,
> > +    usesCustomInserter = 1 in {
> > +  def EH_SjLj_SetJmp32  : Pseudo<(outs GPRC:$dst), (ins
> > memr:$buf),
> > +                            "#EH_SJLJ_SETJMP32",
> > +                            [(set GPRC:$dst, (PPCeh_sjlj_setjmp
> > addr:$buf))]>,
> > +                          Requires<[In32BitMode]>;
> > +  let isTerminator = 1 in
> > +  def EH_SjLj_LongJmp32 : Pseudo<(outs), (ins memr:$buf),
> > +                            "#EH_SJLJ_LONGJMP32",
> > +                            [(PPCeh_sjlj_longjmp addr:$buf)]>,
> > +                          Requires<[In32BitMode]>;
> > +}
> > +
> > +let isBranch = 1, isTerminator = 1, isCodeGenOnly = 1 in {
> > +  def EH_SjLj_Setup : Pseudo<(outs), (ins directbrtarget:$dst),
> > +                        "#EH_SjLj_Setup\t$dst", []>;
> > +}
> > 
> >  // DCB* instructions.
> >  def DCBA   : DCB_Form<758, 0, (outs), (ins memrr:$dst),
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.cpp
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.cpp?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.cpp (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.cpp Thu Mar 21
> > 16:37:52 2013
> > @@ -105,6 +105,18 @@ PPCRegisterInfo::getCallPreservedMask(Ca
> >    return Subtarget.isPPC64() ? CSR_SVR464_RegMask :
> >    CSR_SVR432_RegMask;
> >  }
> > 
> > +const uint32_t*
> > +PPCRegisterInfo::getNoPreservedMask() const {
> > +  // The naming here is inverted: The CSR_NoRegs_Altivec has the
> > +  // Altivec registers masked so that they're not saved and
> > restored around
> > +  // instructions with this preserved mask.
> > +
> > +  if (!Subtarget.hasAltivec())
> > +    return CSR_NoRegs_Altivec_RegMask;
> > +
> > +  return CSR_NoRegs_RegMask;
> > +}
> > +
> >  BitVector PPCRegisterInfo::getReservedRegs(const MachineFunction
> >  &MF) const {
> >    BitVector Reserved(getNumRegs());
> >    const PPCFrameLowering *PPCFI =
> > 
> > Modified: llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.h
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.h?rev=177666&r1=177665&r2=177666&view=diff
> > ==============================================================================
> > --- llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.h (original)
> > +++ llvm/trunk/lib/Target/PowerPC/PPCRegisterInfo.h Thu Mar 21
> > 16:37:52 2013
> > @@ -44,6 +44,7 @@ public:
> >    /// Code Generation virtual methods...
> >    const uint16_t *getCalleeSavedRegs(const MachineFunction* MF =
> >    0) const;
> >    const uint32_t *getCallPreservedMask(CallingConv::ID CC) const;
> > +  const uint32_t *getNoPreservedMask() const;
> > 
> >    BitVector getReservedRegs(const MachineFunction &MF) const;
> > 
> > 
> > Added: llvm/trunk/test/CodeGen/PowerPC/sjlj.ll
> > URL:
> > http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/sjlj.ll?rev=177666&view=auto
> > ==============================================================================
> > --- llvm/trunk/test/CodeGen/PowerPC/sjlj.ll (added)
> > +++ llvm/trunk/test/CodeGen/PowerPC/sjlj.ll Thu Mar 21 16:37:52
> > 2013
> > @@ -0,0 +1,108 @@
> > +; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 |
> > FileCheck %s
> > +; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2 |
> > FileCheck -check-prefix=CHECK-NOAV %s
> > +target datalayout =
> > "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
> > +target triple = "powerpc64-unknown-linux-gnu"
> > +
> > +%struct.__jmp_buf_tag = type { [64 x i64], i32,
> > %struct.__sigset_t, [8 x i8] }
> > +%struct.__sigset_t = type { [16 x i64] }
> > +
> > + at env_sigill = internal global [1 x %struct.__jmp_buf_tag]
> > zeroinitializer, align 16
> > +
> > +define void @foo() #0 {
> > +entry:
> > +  call void @llvm.eh.sjlj.longjmp(i8* bitcast ([1 x
> > %struct.__jmp_buf_tag]* @env_sigill to i8*))
> > +  unreachable
> > +
> > +; CHECK: @foo
> > +; CHECK: addis [[REG:[0-9]+]], 2, env_sigill at toc@ha
> > +; CHECK: addi [[REG]], [[REG]], env_sigill at toc@l
> 
> This is a nitpick, but the fact that the two registers on the addi
> are
> the same is an artifact of the current register allocation scheme and
> might not always hold.  It would be safer to not assume they're the
> same.

Good point, thanks!

> 
> > +; CHECK: ld 31, 0([[REG]])
> > +; CHECK: ld [[REG2:[0-9]+]], 8([[REG]])
> > +; CHECK: ld 1, 16([[REG]])
> > +; CHECK: mtctr [[REG2]]
> > +; CHECK: ld 2, 24([[REG]])
> > +; CHECK: bctr
> > +
> > +return:                                           ; No
> > predecessors!
> > +  ret void
> > +}
> > +
> > +declare void @llvm.eh.sjlj.longjmp(i8*) #1
> > +
> > +define signext i32 @main() #0 {
> > +entry:
> > +  %retval = alloca i32, align 4
> > +  store i32 0, i32* %retval
> > +  %0 = call i8* @llvm.frameaddress(i32 0)
> > +  store i8* %0, i8** bitcast ([1 x %struct.__jmp_buf_tag]*
> > @env_sigill to i8**)
> > +  %1 = call i8* @llvm.stacksave()
> > +  store i8* %1, i8** getelementptr (i8** bitcast ([1 x
> > %struct.__jmp_buf_tag]* @env_sigill to i8**), i32 2)
> > +  %2 = call i32 @llvm.eh.sjlj.setjmp(i8* bitcast ([1 x
> > %struct.__jmp_buf_tag]* @env_sigill to i8*))
> > +  %tobool = icmp ne i32 %2, 0
> > +  br i1 %tobool, label %if.then, label %if.else
> > +
> > +if.then:                                          ; preds = %entry
> > +  store i32 1, i32* %retval
> > +  br label %return
> > +
> > +if.else:                                          ; preds = %entry
> > +  call void @foo()
> > +  br label %if.end
> > +
> > +if.end:                                           ; preds =
> > %if.else
> > +  store i32 0, i32* %retval
> > +  br label %return
> > +
> > +return:                                           ; preds =
> > %if.end, %if.then
> > +  %3 = load i32* %retval
> > +  ret i32 %3
> > +
> > +; CHECK: @main
> > +; CHECK: std
> > +; CHECK: stfd
> > +; CHECK: stvx
> > +
> > +; CHECK: addis [[REG:[0-9]+]], 2, env_sigill at toc@ha
> > +; CHECK: std 31, env_sigill at toc@l([[REG]])
> > +; CHECK: addi [[REG]], [[REG]], env_sigill at toc@l
> 
> Same nitpick here.

Thanks for looking this over! I appreciate the review.

 -Hal

> 
> Thanks,
> Bill
> 
> > +; CHECK: std [[REG]], [[OFF:[0-9]+]](31)                  # 8-byte
> > Folded Spill
> > +; CHECK: std 1, 16([[REG]])
> > +; CHECK: std 2, 24([[REG]])
> > +; CHECK: bcl 20, 31, .LBB1_1
> > +; CHECK: li 3, 1
> > +; CHECK: #EH_SjLj_Setup	.LBB1_1
> > +; CHECK: b .LBB1_2
> > +
> > +; CHECK: .LBB1_1:
> > +; CHECK: mflr [[REGL:[0-9]+]]
> > +; CHECK: ld [[REG2:[0-9]+]], [[OFF]](31)                   #
> > 8-byte Folded Reload
> > +; CHECK: std [[REGL]], 8([[REG2]])
> > +; CHECK: li 3, 0
> > +
> > +; CHECK: .LBB1_2:
> > +
> > +; CHECK: lfd
> > +; CHECK: lvx
> > +; CHECK: ld
> > +; CHECK: blr
> > +
> > +; CHECK-NOAV: @main
> > +; CHECK-NOAV-NOT: stvx
> > +; CHECK-NOAV: bcl
> > +; CHECK-NOAV: mflr
> > +; CHECK-NOAV: bl foo
> > +; CHECK-NOAV-NOT: lvx
> > +; CHECK-NOAV: blr
> > +}
> > +
> > +declare i8* @llvm.frameaddress(i32) #2
> > +
> > +declare i8* @llvm.stacksave() #3
> > +
> > +declare i32 @llvm.eh.sjlj.setjmp(i8*) #3
> > +
> > +attributes #0 = { nounwind "less-precise-fpmad"="false"
> > "no-frame-pointer-elim"="false"
> > "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false"
> > "no-nans-fp-math"="false" "unsafe-fp-math"="false"
> > "use-soft-float"="false" }
> > +attributes #1 = { noreturn nounwind }
> > +attributes #2 = { nounwind readnone }
> > +attributes #3 = { nounwind }
> > +
> > 
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > 
> 
> 



More information about the llvm-commits mailing list