[llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32

Wed Oct 23 14:35:15 PDT 2013

Hi, Manman and Bob. Thanks for the review.

Manman,

This patch is undoing the earlier refactoring that avoided code duplication.
Bob was unhappy with the amount of source code it produced when I refactored
to avoid the duplication and I produced this latest patch to revert the
refactoring but keep support for thumb1.

Bob,

I had purposely defined the scratch on each branch to limit the scope of the
variable and keep the definition close to the use. I can hoist it out of the
conditionals if you think that is better.

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation

From: Bob Wilson [mailto:bob.wilson at apple.com] 
Sent: Wednesday, October 23, 2013 12:47 PM
To: Manman Ren
Cc: David Peixotto; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32

On Oct 23, 2013, at 12:32 PM, Manman Ren <manman.ren at gmail.com> wrote:

Hi David,

Can we use helper functions for emitUnitLoad|Store emitByteLoad|Store to
reduce code duplication?

Thanks,

Manman

Yes, that would be great, if it works.

The only other thing I noticed is that this patch have several places where
every it creates a scratch virtual register on every branch of a
conditional.  Those could be moved outside the conditionals.

On Tue, Oct 22, 2013 at 3:50 PM, David Peixotto <dpeixott at codeaurora.org>
wrote:

I've attached a patch that removes the class in place of inline
conditionals. Please help to review this patch.

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation

> -----Original Message-----
> From: Bob Wilson [mailto:bob.wilson at apple.com]

> Sent: Tuesday, October 22, 2013 9:15 AM
> To: David Peixotto
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
>

> Thanks.  At least try it out.. If it turns out to be unreadable with all

the
> conditionals, we can reconsider.
>
> On Oct 22, 2013, at 9:13 AM, David Peixotto <dpeixott at codeaurora.org>
> wrote:
>
> > Ok, I will make a patch to switch to using inline conditionals instead.
> >
> > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > hosted by The Linux Foundation
> >
> >
> >
> >> -----Original Message-----
> >> From: Bob Wilson [mailto:bob.wilson at apple.com]
> >> Sent: Monday, October 21, 2013 9:18 PM
> >> To: David Peixotto
> >> Cc: llvm-commits at cs.uiuc.edu LLVM
> >> Subject: Re: [llvm] r192915 - Refactor lowering for
> >> COPY_STRUCT_BYVAL_I32
> >>
> >>
> >> On Oct 21, 2013, at 7:20 PM, David Peixotto <dpeixott at codeaurora.org>
> >> wrote:
> >>
> >>> Hi Bob,
> >>>
> >>> I agree that a generic emitter would  be useful, but I'm not sure I
> >>> would get the time to work on such a project at this point.
> >>
> >> In that case, I think it would be better to simplify this code to use
> >> a
> > more
> >> direct approach with conditionals.
> >>
> >>>
> >>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> >>> Forum, hosted by The Linux Foundation
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
> >>>> Sent: Monday, October 21, 2013 12:12 PM
> >>>> To: David Peixotto
> >>>> Cc: llvm-commits at cs.uiuc.edu
> >>>> Subject: Re: [llvm] r192915 - Refactor lowering for
> >>>> COPY_STRUCT_BYVAL_I32
> >>>>
> >>>> It seems like it make more sense to make the StructByvalEmitter a
> >>>> generic "emitter" for ARM instructions.  I suspect there are a
> >>>> number of other
> >>> places
> >>>> in ARMISelLowering that could make good use of it.  Would you be
> >>>> willing
> >>> to
> >>>> investigate that?
> >>>>
> >>>> As it stands, this really does seem like overkill for the struct
> >>>> byval
> >>> issue, but if
> >>>> you could make it more generally useful, then the extra code would
> >>>> be worthwhile.
> >>>>
> >>>> On Oct 21, 2013, at 10:02 AM, David Peixotto
> >>>> <dpeixott at codeaurora.org>
> >>>> wrote:
> >>>>
> >>>>> Hi Bob,
> >>>>>
> >>>>> I think your criticism is valid here. I wasn't too happy with how
> >>>>> much code I ended up writing for this change. When I started I
> >>>>> thought the code size would be about equal after implementing the
> >>>>> thumb1 lowering because I was getting rid of some code
> >>>>> duplication, but the code size for abstracting the common parts
> >>>>> was larger than I
> > had
> >> anticipated.
> >>>>> There is no fundamental problem with implementing it with
> >>>>> conditionals, I did it this way because I thought it would be
> >>>>> clearer
> >>> and
> >>>> would be easier to write correctly.
> >>>>>
> >>>>> I think the way it is now has the advantage that the lowering
> >>>>> algorithm is clearly separated from the details of generating
> >>>>> machine instructions for each sub-target. I think it would be
> >>>>> easier to improve the algorithm with the way it is now. For
> >>>>> example, we are always using byte stores to copy any leftover that
> >>>>> does not fit into the "unit" size. So if we have a 31-byte struct
> >>>>> and a target that supports neon we will generate a 16-byte store
> >>>>> and
> >>>>> 15 1-byte stores. We could improve this by generating fewer stores
> >>>>> for the leftover (3x4-byte + 1x2byte + 1x1byte). I don't know if
> >>>>> we actually care about this kind of change, but I believe it would
> >>>>> be
> >>> easier to
> >>>> make now.
> >>>>>
> >>>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> >>>>> Forum, hosted by The Linux Foundation
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
> >>>>>> Sent: Saturday, October 19, 2013 7:00 PM
> >>>>>> To: David Peixotto
> >>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>> Subject: Re: [llvm] r192915 - Refactor lowering for
> >>>>>> COPY_STRUCT_BYVAL_I32
> >>>>>>
> >>>>>> This is very nice and elegant, but it's an awful lot of code for
> >>>>>> something
> >>>>> that
> >>>>>> isn't really that complicated.  It seems like overkill to me.
> >>>>>> Did you
> >>>>> consider
> >>>>>> implementing the Thumb1 support by just adding more
> conditionals?
> >>>>>> Is there a fundamental problem with that?
> >>>>>>
> >>>>>> On Oct 17, 2013, at 12:49 PM, David Peixotto
> >>>>>> <dpeixott at codeaurora.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Author: dpeixott
> >>>>>>> Date: Thu Oct 17 14:49:22 2013
> >>>>>>> New Revision: 192915
> >>>>>>>
> >>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=192915
<http://llvm.org/viewvc/llvm-project?rev=192915&view=rev> &view=rev
> >>>>>>> Log:
> >>>>>>> Refactor lowering for COPY_STRUCT_BYVAL_I32
> >>>>>>>
> >>>>>>> This commit refactors the lowering of the
> COPY_STRUCT_BYVAL_I32
> >>>>>>> pseudo-instruction in the ARM backend. We introduce a new
> helper
> >>>>>>> class that encapsulates all of the operations needed during the
> >>> lowering.
> >>>>>>> The operations are implemented for each subtarget in different
> >>>>>>> subclasses. Currently only arm and thumb2 subtargets are
> supported.
> >>>>>>>
> >>>>>>> This refactoring was done to easily implement support for thumb1
> >>>>>>> subtargets. This initial patch does not add support for thumb1,
> >>>>>>> but is only a refactoring. A follow on patch will implement the
> >>>>>>> support for
> >>>>>>> thumb1 subtargets.
> >>>>>>>
> >>>>>>> No intended functionality change.
> >>>>>>>
> >>>>>>> Modified:
> >>>>>>> llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> >>>>>>>
> >>>>>>> Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
> >>>>>>> URL:
> >>>>>>> http://llvm.org/viewvc/llvm-
> >>>> project/llvm/trunk/lib/Target/ARM/ARMISe
> >>>>>>> lL owering.cpp?rev=192915&r1=192914&r2=192915&view=diff
> >>>>>>>
> >>>>>>
> >>>>
> >>
> ==========================================================
> >>>>>> ============
> >>>>>>> ========
> >>>>>>> --- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
> >>>>>>> +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Thu Oct 17
> >>>>>>> +++ 14:49:22
> >>>>>>> +++ 2013
> >>>>>>> @@ -48,6 +48,7 @@
> >>>>>>> #include "llvm/Support/MathExtras.h"
> >>>>>>> #include "llvm/Support/raw_ostream.h"
> >>>>>>> #include "llvm/Target/TargetOptions.h"
> >>>>>>> +#include <utility>
> >>>>>>> using namespace llvm;
> >>>>>>>
> >>>>>>> STATISTIC(NumTailCalls, "Number of tail calls"); @@ -7245,8
> >>>>>>> +7246,430 @@ MachineBasicBlock *OtherSucc(MachineBasi
> >>>>>>> llvm_unreachable("Expecting a BB with two successors!"); }
> >>>>>>>
> >>>>>>> -MachineBasicBlock *ARMTargetLowering::
> >>>>>>> -EmitStructByval(MachineInstr *MI, MachineBasicBlock *BB) const
> >>>>>>> {
> >>>>>>> +namespace {
> >>>>>>> +// This class is a helper for lowering the

> >>>>>>> +COPY_STRUCT_BYVAL_I32

> >>>>>> instruction.
> >>>>>>> +// It defines the operations needed to lower the byval copy. We
> >>>>>>> +use a helper // class because the opcodes and machine
> >>>>>>> +instructions are different for each // subtarget, but the

> >>>>>>> +overall algorithm for the lowering is the same.  The //
> >>>>>>> +implementation of each operation will be defined separately for
> >>>>>>> +arm, thumb1, // and

> >>>>>>> +thumb2 targets by subclassing this base class. See //
> >>>>> ARMTargetLowering::EmitStructByval()
> >>>>>> for how these operations are used.
> >>>>>>> +class TargetStructByvalEmitter {
> >>>>>>> +public:
> >>>>>>> +  TargetStructByvalEmitter(const TargetInstrInfo *TII_,
> >>>>>>> +                           MachineRegisterInfo &MRI_,
> >>>>>>> +                           const TargetRegisterClass *TRC_)
> >>>>>>> +      : TII(TII_), MRI(MRI_), TRC(TRC_) {}
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment load of "unit" size. The unit size

> >>>>>>> + is based on the  // alignment of the struct being copied (4,

> >>>>>>> + 2, or
> >>>>>>> + 1 bytes). Alignments higher  // than 4 are handled separately

> >>>>>>> + by using

> >>>>>> NEON instructions.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to load.
> >>>>>>> +  // \param baseOut the register to recieve the incremented
> >> address.
> >>>>>>> +  // \returns the register holding the loaded value.
> >>>>>>> +  virtual unsigned emitUnitLoad(MachineBasicBlock *BB,
> >>>>>>> + MachineInstr
> >>>>>> *MI,
> >>>>>>> +                                DebugLoc &dl, unsigned baseReg,
> >>>>>>> +                                unsigned baseOut) = 0;
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment store of "unit" size. The unit size

> >>>>>>> + is based on the  // alignment of the struct being copied (4,

> >>>>>>> + 2, or
> >>>>>>> + 1 bytes). Alignments higher  // than 4 are handled separately

> >>>>>>> + by using

> >>>>>> NEON instructions.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to store.
> >>>>>>> +  // \param storeReg the register holding the value to store.
> >>>>>>> +  // \param baseOut the register to recieve the incremented
> >> address.
> >>>>>>> +  virtual void emitUnitStore(MachineBasicBlock *BB,

> >>>>>>> + MachineInstr
> >> *MI,
> >>>>>>> +                             DebugLoc &dl, unsigned baseReg,
> >>>>>>> + unsigned

> >>>>> storeReg,
> >>>>>>> +                             unsigned baseOut) = 0;
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment load of one byte.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to load.
> >>>>>>> +  // \param baseOut the register to recieve the incremented
> >> address.
> >>>>>>> +  // \returns the register holding the loaded value.
> >>>>>>> +  virtual unsigned emitByteLoad(MachineBasicBlock *BB,
> >>>>>>> + MachineInstr
> >>>>>> *MI,
> >>>>>>> +                                DebugLoc &dl, unsigned baseReg,
> >>>>>>> +                                unsigned baseOut) = 0;
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment store of one byte.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to store.
> >>>>>>> +  // \param storeReg the register holding the value to store.
> >>>>>>> +  // \param baseOut the register to recieve the incremented
> >> address.
> >>>>>>> +  virtual void emitByteStore(MachineBasicBlock *BB,

> >>>>>>> + MachineInstr
> >> *MI,
> >>>>>>> +                             DebugLoc &dl, unsigned baseReg,
> >>>>>>> + unsigned

> >>>>> storeReg,
> >>>>>>> +                             unsigned baseOut) = 0;
> >>>>>>> +
> >>>>>>> +  // Emit a load of a constant value.
> >>>>>>> +  //
> >>>>>>> +  // \param Constant the register holding the address to store.
> >>>>>>> +  // \returns the register holding the loaded value.
> >>>>>>> +  virtual unsigned emitConstantLoad(MachineBasicBlock *BB,
> >>>>>> MachineInstr *MI,
> >>>>>>> +                                    DebugLoc &dl, unsigned
> > Constant,
> >>>>>>> +                                    const DataLayout *DL) = 0;
> >>>>>>> +
> >>>>>>> +  // Emit a subtract of a register minus immediate, with the
> >>>>>>> + immediate equal to  // the "unit" size. The unit size is based
> >>>>>>> + on the alignment of the struct  // being copied (16, 8, 4, 2,
> >>>>>>> + or
> >>>>>>> + 1
> >>>>> bytes).
> >>>>>>> +  //
> >>>>>>> +  // \param InReg the register holding the initial value.
> >>>>>>> +  // \param OutReg the register to recieve the subtracted value.
> >>>>>>> +  virtual void emitSubImm(MachineBasicBlock *BB, MachineInstr
> >>>>>>> + *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                          unsigned InReg, unsigned OutReg) = 0;
> >>>>>>> +
> >>>>>>> +  // Emit a branch based on a condition code of not equal.
> >>>>>>> +  //
> >>>>>>> +  // \param TargetBB the destination of the branch.
> >>>>>>> +  virtual void emitBranchNE(MachineBasicBlock *BB, MachineInstr
> >> *MI,
> >>>>>>> +                            DebugLoc &dl, MachineBasicBlock
> >>>>>>> + *TargetBB) = 0;
> >>>>>>> +
> >>>>>>> +  // Find the constant pool index for the given constant. This
> >>>>>>> + method is  // implemented in the base class because it is the
> >>>>>>> + same for all
> >>>>>> subtargets.
> >>>>>>> +  //
> >>>>>>> +  // \param LoopSize the constant value for which the index
> >>>>>>> + should be
> >>>>>> returned.
> >>>>>>> +  // \returns the constant pool index for the constant.
> >>>>>>> +  unsigned getConstantPoolIndex(MachineFunction *MF, const
> >>>>>> DataLayout *DL,
> >>>>>>> +                                unsigned LoopSize) {
> >>>>>>> +    MachineConstantPool *ConstantPool = MF->getConstantPool();
> >>>>>>> +    Type *Int32Ty = Type::getInt32Ty(MF->getFunction()-
> >>>>> getContext());
> >>>>>>> +    const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
> >>>>>>> +
> >>>>>>> +    // MachineConstantPool wants an explicit alignment.
> >>>>>>> +    unsigned Align = DL->getPrefTypeAlignment(Int32Ty);
> >>>>>>> +    if (Align == 0)
> >>>>>>> +      Align = DL->getTypeAllocSize(C->getType());
> >>>>>>> +    return ConstantPool->getConstantPoolIndex(C, Align);  }
> >>>>>>> +
> >>>>>>> +  // Return the register class used by the subtarget.
> >>>>>>> +  //
> >>>>>>> +  // \returns the target register class.
> >>>>>>> +  const TargetRegisterClass *getTRC() const { return TRC; }
> >>>>>>> +
> >>>>>>> +  virtual ~TargetStructByvalEmitter() {};
> >>>>>>> +
> >>>>>>> +protected:
> >>>>>>> +  const TargetInstrInfo *TII;
> >>>>>>> +  MachineRegisterInfo &MRI;
> >>>>>>> +  const TargetRegisterClass *TRC; };
> >>>>>>> +
> >>>>>>> +class ARMStructByvalEmitter : public TargetStructByvalEmitter {
> >>>>>>> +public:
> >>>>>>> +  ARMStructByvalEmitter(const TargetInstrInfo *TII,
> >>>>>>> +MachineRegisterInfo
> >>>>>> &MRI,
> >>>>>>> +                        unsigned LoadStoreSize)
> >>>>>>> +      : TargetStructByvalEmitter(
> >>>>>>> +            TII, MRI, (const TargetRegisterClass
> >>> *)&ARM::GPRRegClass),
> >>>>>>> +        UnitSize(LoadStoreSize),
> >>>>>>> +        UnitLdOpc(LoadStoreSize == 4
> >>>>>>> +                      ? ARM::LDR_POST_IMM
> >>>>>>> +                      : LoadStoreSize == 2
> >>>>>>> +                            ? ARM::LDRH_POST
> >>>>>>> +                            : LoadStoreSize == 1 ?
> >>>>>>> + ARM::LDRB_POST_IMM
> >>> :
> >>>>> 0),
> >>>>>>> +        UnitStOpc(LoadStoreSize == 4
> >>>>>>> +                      ? ARM::STR_POST_IMM
> >>>>>>> +                      : LoadStoreSize == 2
> >>>>>>> +                            ? ARM::STRH_POST
> >>>>>>> +                            : LoadStoreSize == 1 ?
> >>>>>>> +ARM::STRB_POST_IMM
> >>>>>>> +: 0) {}
> >>>>>>> +
> >>>>>>> +  unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr
> >>>>>>> + *MI,
> >>>>>> DebugLoc &dl,

> >>>>>>> +                        unsigned baseReg, unsigned baseOut) {
> >>>>>>> +    unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
> >>>>>> scratch).addReg(
> >>>>>>> +        baseOut,
> >>>>>> RegState::Define).addReg(baseReg).addReg(0).addImm(UnitSize));
> >>>>>>> +    return scratch;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                     unsigned baseReg, unsigned storeReg,
> >>>>>>> + unsigned
> >>>>> baseOut) {
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
> >>>>>> baseOut).addReg(
> >>>>>>> +        storeReg).addReg(baseReg).addReg(0).addImm(UnitSize));
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr

> >>>>>>> + *MI,
> >>>>>> DebugLoc &dl,

> >>>>>>> +                        unsigned baseReg, unsigned baseOut) {
> >>>>>>> +    unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> + TII->get(ARM::LDRB_POST_IMM),
> >>>>>> scratch)
> >>>>>>> +                       .addReg(baseOut,
> >>>>> RegState::Define).addReg(baseReg)
> >>>>>>> +                       .addReg(0).addImm(1));
> >>>>>>> +    return scratch;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                     unsigned baseReg, unsigned storeReg,
> >>>>>>> + unsigned
> >>>>> baseOut) {
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> + TII->get(ARM::STRB_POST_IMM),
> >>>>>> baseOut)
> >>>>>>> +
> >>>>>>> + .addReg(storeReg).addReg(baseReg).addReg(0).addImm(1));
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  unsigned emitConstantLoad(MachineBasicBlock *BB,
> MachineInstr
> >>>> *MI,
> >>>>>>> +                            DebugLoc &dl, unsigned Constant,
> >>>>>>> +                            const DataLayout *DL) {
> >>>>>>> +    unsigned constReg = MRI.createVirtualRegister(TRC);
> >>>>>>> +    unsigned Idx = getConstantPoolIndex(BB->getParent(), DL,
> >>>> Constant);
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl, TII-
> >>> get(ARM::LDRcp)).addReg(
> >>>>>>> +        constReg,
> >>>>> RegState::Define).addConstantPoolIndex(Idx).addImm(0));
> >>>>>>> +    return constReg;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc
> >>>>>> &dl,
> >>>>>>> +                  unsigned InReg, unsigned OutReg) {
> >>>>>>> +    MachineInstrBuilder MIB =
> >>>>>>> +        BuildMI(*BB, MI, dl, TII->get(ARM::SUBri), OutReg);
> >>>>>>> +
> >>>>>>
> >> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
> >>>>>>> +    MIB->getOperand(5).setReg(ARM::CPSR);
> >>>>>>> +    MIB->getOperand(5).setIsDef(true);
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                    MachineBasicBlock *TargetBB) {
> >>>>>>> +    BuildMI(*BB, MI, dl, TII-
> >>>>>>> get(ARM::Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
> >>>>>>> +        .addReg(ARM::CPSR);
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +private:
> >>>>>>> +  const unsigned UnitSize;
> >>>>>>> +  const unsigned UnitLdOpc;
> >>>>>>> +  const unsigned UnitStOpc;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +class Thumb2StructByvalEmitter : public

> >>>>>>> +TargetStructByvalEmitter {

> >>>>>>> +public:
> >>>>>>> +  Thumb2StructByvalEmitter(const TargetInstrInfo *TII,
> >>>>>> MachineRegisterInfo &MRI,
> >>>>>>> +                           unsigned LoadStoreSize)
> >>>>>>> +      : TargetStructByvalEmitter(
> >>>>>>> +            TII, MRI, (const TargetRegisterClass
> >>> *)&ARM::tGPRRegClass),
> >>>>>>> +        UnitSize(LoadStoreSize),
> >>>>>>> +        UnitLdOpc(LoadStoreSize == 4
> >>>>>>> +                      ? ARM::t2LDR_POST
> >>>>>>> +                      : LoadStoreSize == 2
> >>>>>>> +                            ? ARM::t2LDRH_POST
> >>>>>>> +                            : LoadStoreSize == 1 ?
> >>>>>>> + ARM::t2LDRB_POST
> > :
> >>>>> 0),
> >>>>>>> +        UnitStOpc(LoadStoreSize == 4
> >>>>>>> +                      ? ARM::t2STR_POST
> >>>>>>> +                      : LoadStoreSize == 2
> >>>>>>> +                            ? ARM::t2STRH_POST
> >>>>>>> +                            : LoadStoreSize == 1 ?
> >>>>>>> + ARM::t2STRB_POST
> > :
> >>>>>>> +0) {}
> >>>>>>> +
> >>>>>>> +  unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr

> >>>>>>> + *MI,
> >>>>>> DebugLoc &dl,

> >>>>>>> +                        unsigned baseReg, unsigned baseOut) {
> >>>>>>> +    unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
> >>>>>> scratch).addReg(
> >>>>>>> +        baseOut,
> >> RegState::Define).addReg(baseReg).addImm(UnitSize));
> >>>>>>> +    return scratch;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                     unsigned baseReg, unsigned storeReg,
> >>>>>>> + unsigned
> >>>>> baseOut) {
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
> >>>>>>> + baseOut)
> >>>>>>> +
> >>>>>>> + .addReg(storeReg).addReg(baseReg).addImm(UnitSize));
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr

> >>>>>>> + *MI,
> >>>>>> DebugLoc &dl,

> >>>>>>> +                        unsigned baseReg, unsigned baseOut) {
> >>>>>>> +    unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> + TII->get(ARM::t2LDRB_POST),
> >>>>>> scratch)
> >>>>>>> +                       .addReg(baseOut,
> >>>>> RegState::Define).addReg(baseReg)
> >>>>>>> +                       .addImm(1));
> >>>>>>> +    return scratch;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                     unsigned baseReg, unsigned storeReg,
> >>>>>>> + unsigned
> >>>>> baseOut) {
> >>>>>>> +    AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> + TII->get(ARM::t2STRB_POST),
> >>>>>> baseOut)
> >>>>>>> +
> >>>>>>> + .addReg(storeReg).addReg(baseReg).addImm(1));
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  unsigned emitConstantLoad(MachineBasicBlock *BB,
> MachineInstr
> >>>> *MI,
> >>>>>>> +                            DebugLoc &dl, unsigned Constant,
> >>>>>>> +                            const DataLayout *DL) {
> >>>>>>> +    unsigned VConst = MRI.createVirtualRegister(TRC);
> >>>>>>> +    unsigned Vtmp = VConst;
> >>>>>>> +    if ((Constant & 0xFFFF0000) != 0)
> >>>>>>> +      Vtmp = MRI.createVirtualRegister(TRC);
> >>>>>>> +    AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16),
> Vtmp)
> >>>>>>> +                       .addImm(Constant & 0xFFFF));
> >>>>>>> +
> >>>>>>> +    if ((Constant & 0xFFFF0000) != 0)
> >>>>>>> +      AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
> >>> VConst)
> >>>>>>> +                         .addReg(Vtmp).addImm(Constant >> 16));
> >>>>>>> +    return VConst;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc
> >>>>>> &dl,
> >>>>>>> +                  unsigned InReg, unsigned OutReg) {
> >>>>>>> +    MachineInstrBuilder MIB =
> >>>>>>> +        BuildMI(*BB, MI, dl, TII->get(ARM::t2SUBri), OutReg);
> >>>>>>> +
> >>>>>>
> >> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
> >>>>>>> +    MIB->getOperand(5).setReg(ARM::CPSR);
> >>>>>>> +    MIB->getOperand(5).setIsDef(true);
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                    MachineBasicBlock *TargetBB) {
> >>>>>>> +    BuildMI(BB, dl, TII-
> >>>>>>> get(ARM::t2Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
> >>>>>>> +        .addReg(ARM::CPSR);
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +private:
> >>>>>>> +  const unsigned UnitSize;
> >>>>>>> +  const unsigned UnitLdOpc;
> >>>>>>> +  const unsigned UnitStOpc;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +// This class is a thin wrapper that delegates most of the work
> >>>>>>> +to the correct // TargetStructByvalEmitter implementation. It
> >>>>>>> +also handles the lowering for // targets that support neon
> >>>>>>> +because the neon implementation is the same for all // targets

> >>>>>>> +that

> >> support it.
> >>>>>>> +class StructByvalEmitter {
> >>>>>>> +public:
> >>>>>>> +  StructByvalEmitter(unsigned LoadStoreSize, const ARMSubtarget
> >>>>>> *Subtarget,
> >>>>>>> +                     const TargetInstrInfo *TII_,
> >>>>>>> + MachineRegisterInfo
> >>>>> &MRI_,
> >>>>>>> +                     const DataLayout *DL_)
> >>>>>>> +      : UnitSize(LoadStoreSize),
> >>>>>>> +        TargetEmitter(
> >>>>>>> +          Subtarget->isThumb2()
> >>>>>>> +              ? static_cast<TargetStructByvalEmitter *>(
> >>>>>>> +                    new Thumb2StructByvalEmitter(TII_, MRI_,
> >>>>>>> +                                                 LoadStoreSize))
> >>>>>>> +              : static_cast<TargetStructByvalEmitter *>(
> >>>>>>> +                    new ARMStructByvalEmitter(TII_, MRI_,
> >>>>>>> +                                              LoadStoreSize))),
> >>>>>>> +        TII(TII_), MRI(MRI_), DL(DL_),
> >>>>>>> +        VecTRC(UnitSize == 16
> >>>>>>> +                   ? (const TargetRegisterClass
> > *)&ARM::DPairRegClass
> >>>>>>> +                   : UnitSize == 8
> >>>>>>> +                         ? (const TargetRegisterClass
> >>>>> *)&ARM::DPRRegClass
> >>>>>>> +                         : 0),
> >>>>>>> +        VecLdOpc(UnitSize == 16 ? ARM::VLD1q32wb_fixed
> >>>>>>> +                                : UnitSize == 8 ?
> >>>>>>> + ARM::VLD1d32wb_fixed
> >>>>> : 0),
> >>>>>>> +        VecStOpc(UnitSize == 16 ? ARM::VST1q32wb_fixed
> >>>>>>> +                                : UnitSize == 8 ?
> >>>>>>> +ARM::VST1d32wb_fixed : 0) {}
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment load of "unit" size. The unit size

> >>>>>>> + is based on the  // alignment of the struct being copied (16,
> >>>>>>> + 8, 4, 2, or 1 bytes). Loads of 16  // or 8 bytes use NEON
> >>>>>>> + instructions to load

> >>>>> the
> >>>>>> value.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to load.
> >>>>>>> +  // \param baseOut the register to recieve the incremented
> >> address.
> >>>>>>> + If baseOut  // is 0 then a new register is created to hold the
> >>>>> incremented
> >>>>>> address.
> >>>>>>> +  // \returns a pair of registers holding the loaded value and
> >>>>>>> + the updated  // address.
> >>>>>>> +  std::pair<unsigned, unsigned> emitUnitLoad(MachineBasicBlock
> >> *BB,
> >>>>>>> +                                             MachineInstr *MI,
> >>>>>>> + DebugLoc
> >>>>> &dl,
> >>>>>>> +                                             unsigned baseReg,
> >>>>>>> +                                             unsigned baseOut =
> >>>>>>> + 0)
> > {
> >>>>>>> +    unsigned scratch = 0;
> >>>>>>> +    if (baseOut == 0)
> >>>>>>> +      baseOut = MRI.createVirtualRegister(TargetEmitter-
> >getTRC());
> >>>>>>> +    if (UnitSize >= 8) { // neon
> >>>>>>> +      scratch = MRI.createVirtualRegister(VecTRC);
> >>>>>>> +      AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecLdOpc),
> >>>>>> scratch).addReg(
> >>>>>>> +          baseOut, RegState::Define).addReg(baseReg).addImm(0));
> >>>>>>> +    } else {
> >>>>>>> +      scratch = TargetEmitter->emitUnitLoad(BB, MI, dl,
> >>>>>>> + baseReg,
> >>>>> baseOut);
> >>>>>>> +    }
> >>>>>>> +    return std::make_pair(scratch, baseOut);  }
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment store of "unit" size. The unit size

> >>>>>>> + is based on the  // alignment of the struct being copied (16,
> >>>>>>> + 8, 4, 2, or 1 bytes). Stores of  // 16 or 8 bytes use NEON

> >>>>>>> + instructions to
> >>>>> store the
> >>>>>> value.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to store.
> >>>>>>> +  // \param storeReg the register holding the value to store.
> >>>>>>> +  // \param baseOut the register to recieve the incremented
> >> address.
> >>>>>>> + If baseOut  // is 0 then a new register is created to hold the
> >>>>> incremented
> >>>>>> address.
> >>>>>>> +  // \returns the register holding the updated address.
> >>>>>>> +  unsigned emitUnitStore(MachineBasicBlock *BB, MachineInstr

> >>>>>>> + *MI,
> >>>>>> DebugLoc &dl,

> >>>>>>> +                         unsigned baseReg, unsigned storeReg,
> >>>>>>> +                         unsigned baseOut = 0) {
> >>>>>>> +    if (baseOut == 0)
> >>>>>>> +      baseOut = MRI.createVirtualRegister(TargetEmitter-
> >getTRC());
> >>>>>>> +    if (UnitSize >= 8) { // neon
> >>>>>>> +      AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecStOpc),
> >>> baseOut)
> >>>>>>> +
> >>> .addReg(baseReg).addImm(0).addReg(storeReg));
> >>>>>>> +    } else {
> >>>>>>> +      TargetEmitter->emitUnitStore(BB, MI, dl, baseReg,
> >>>>>>> + storeReg,
> >>>>>> baseOut);
> >>>>>>> +    }
> >>>>>>> +    return baseOut;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment load of one byte.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to load.
> >>>>>>> +  // \returns a pair of registers holding the loaded value and
> >>>>>>> + the updated  // address.
> >>>>>>> +  std::pair<unsigned, unsigned> emitByteLoad(MachineBasicBlock
> >> *BB,
> >>>>>>> +                                             MachineInstr *MI,
> >>>>>>> + DebugLoc
> >>>>> &dl,
> >>>>>>> +                                             unsigned baseReg) {
> >>>>>>> +    unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
> >>>>>>> getTRC());
> >>>>>>> +    unsigned scratch =
> >>>>>>> +        TargetEmitter->emitByteLoad(BB, MI, dl, baseReg,
baseOut);
> >>>>>>> +    return std::make_pair(scratch, baseOut);  }
> >>>>>>> +
> >>>>>>> +  // Emit a post-increment store of one byte.
> >>>>>>> +  //
> >>>>>>> +  // \param baseReg the register holding the address to store.
> >>>>>>> +  // \param storeReg the register holding the value to store.
> >>>>>>> +  // \returns the register holding the updated address.
> >>>>>>> +  unsigned emitByteStore(MachineBasicBlock *BB, MachineInstr
> >> *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                         unsigned baseReg, unsigned storeReg) {
> >>>>>>> +    unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
> >>>>>>> getTRC());
> >>>>>>> +    TargetEmitter->emitByteStore(BB, MI, dl, baseReg, storeReg,
> >>>>> baseOut);
> >>>>>>> +    return baseOut;
> >>>>>>> +  }
> >>>>>>> +
> >>>>>>> +  // Emit a load of the constant LoopSize.
> >>>>>>> +  //
> >>>>>>> +  // \param LoopSize the constant to load.
> >>>>>>> +  // \returns the register holding the loaded constant.
> >>>>>>> +  unsigned emitConstantLoad(MachineBasicBlock *BB,
> MachineInstr
> >>>> *MI,
> >>>>>>> +                            DebugLoc &dl, unsigned LoopSize) {
> >>>>>>> +    return TargetEmitter->emitConstantLoad(BB, MI, dl,

> >>>>>>> + LoopSize, DL); }

> >>>>>>> +
> >>>>>>> +  // Emit a subtract of a register minus immediate, with the
> >>>>>>> + immediate equal to  // the "unit" size. The unit size is based
> >>>>>>> + on the alignment of the struct  // being copied (16, 8, 4, 2,
> >>>>>>> + or
> >>>>>>> + 1
> >>>>> bytes).
> >>>>>>> +  //
> >>>>>>> +  // \param InReg the register holding the initial value.
> >>>>>>> +  // \param OutReg the register to recieve the subtracted value.
> >>>>>>> +  void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
> >>>> DebugLoc
> >>>>>> &dl,
> >>>>>>> +                  unsigned InReg, unsigned OutReg) {
> >>>>>>> +    TargetEmitter->emitSubImm(BB, MI, dl, InReg, OutReg);  }
> >>>>>>> +
> >>>>>>> +  // Emit a branch based on a condition code of not equal.
> >>>>>>> +  //
> >>>>>>> +  // \param TargetBB the destination of the branch.
> >>>>>>> +  void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
> >>>>>> DebugLoc &dl,
> >>>>>>> +                    MachineBasicBlock *TargetBB) {
> >>>>>>> +    TargetEmitter->emitBranchNE(BB, MI, dl, TargetBB);  }
> >>>>>>> +
> >>>>>>> +  // Return the register class used by the subtarget.
> >>>>>>> +  //
> >>>>>>> +  // \returns the target register class.
> >>>>>>> +  const TargetRegisterClass *getTRC() const { return
> >>>>>>> + TargetEmitter->getTRC(); }
> >>>>>>> +
> >>>>>>> +private:
> >>>>>>> +  const unsigned UnitSize;
> >>>>>>> +  OwningPtr<TargetStructByvalEmitter> TargetEmitter;
> >>>>>>> +  const TargetInstrInfo *TII;
> >>>>>>> +  MachineRegisterInfo &MRI;
> >>>>>>> +  const DataLayout *DL;
> >>>>>>> +
> >>>>>>> +  const TargetRegisterClass *VecTRC;
> >>>>>>> +  const unsigned VecLdOpc;
> >>>>>>> +  const unsigned VecStOpc;
> >>>>>>> +};
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +MachineBasicBlock *
> >>>>>>> +ARMTargetLowering::EmitStructByval(MachineInstr *MI,
> >>>>>>> +                                   MachineBasicBlock *BB) const

> >>>>>>> +{

> >>>>>>> // This pseudo instruction has 3 operands: dst, src, size  // We
> >>>>>>> expand it to a loop if size > Subtarget-
> >>>>>>> getMaxInlineSizeThreshold().
> >>>>>>> // Otherwise, we will generate unrolled scalar copies.
> >>>>>>> @@ -7261,23 +7684,13 @@ EmitStructByval(MachineInstr *MI,
> >> Machin
> >>>>>>> unsigned Align = MI->getOperand(3).getImm();  DebugLoc dl =
> >>>>>>> MI->getDebugLoc();
> >>>>>>>
> >>>>>>> -  bool isThumb2 = Subtarget->isThumb2();  MachineFunction *MF
> =
> >>>>>>> BB->getParent();  MachineRegisterInfo &MRI = MF->getRegInfo();
> >>>>>>> -  unsigned ldrOpc, strOpc, UnitSize = 0;
> >>>>>>> -
> >>>>>>> -  const TargetRegisterClass *TRC = isThumb2 ?
> >>>>>>> -    (const TargetRegisterClass*)&ARM::tGPRRegClass :
> >>>>>>> -    (const TargetRegisterClass*)&ARM::GPRRegClass;
> >>>>>>> -  const TargetRegisterClass *TRC_Vec = 0;
> >>>>>>> +  unsigned UnitSize = 0;
> >>>>>>>
> >>>>>>> if (Align & 1) {
> >>>>>>> -    ldrOpc = isThumb2 ? ARM::t2LDRB_POST :
> ARM::LDRB_POST_IMM;
> >>>>>>> -    strOpc = isThumb2 ? ARM::t2STRB_POST :
> ARM::STRB_POST_IMM;
> >>>>>>>  UnitSize = 1;
> >>>>>>> } else if (Align & 2) {
> >>>>>>> -    ldrOpc = isThumb2 ? ARM::t2LDRH_POST : ARM::LDRH_POST;
> >>>>>>> -    strOpc = isThumb2 ? ARM::t2STRH_POST : ARM::STRH_POST;
> >>>>>>>  UnitSize = 2;
> >>>>>>> } else {
> >>>>>>>  // Check whether we can use NEON instructions.
> >>>>>>> @@ -7285,27 +7698,18 @@ EmitStructByval(MachineInstr *MI,
> >> Machin
> >>>>>>>        hasAttribute(AttributeSet::FunctionIndex,
> >>>>>>>                     Attribute::NoImplicitFloat) &&
> >>>>>>>      Subtarget->hasNEON()) {
> >>>>>>> -      if ((Align % 16 == 0) && SizeVal >= 16) {
> >>>>>>> -        ldrOpc = ARM::VLD1q32wb_fixed;
> >>>>>>> -        strOpc = ARM::VST1q32wb_fixed;
> >>>>>>> +      if ((Align % 16 == 0) && SizeVal >= 16)
> >>>>>>>      UnitSize = 16;
> >>>>>>> -        TRC_Vec = (const
TargetRegisterClass*)&ARM::DPairRegClass;
> >>>>>>> -      }
> >>>>>>> -      else if ((Align % 8 == 0) && SizeVal >= 8) {
> >>>>>>> -        ldrOpc = ARM::VLD1d32wb_fixed;
> >>>>>>> -        strOpc = ARM::VST1d32wb_fixed;
> >>>>>>> +      else if ((Align % 8 == 0) && SizeVal >= 8)
> >>>>>>>      UnitSize = 8;
> >>>>>>> -        TRC_Vec = (const TargetRegisterClass*)&ARM::DPRRegClass;
> >>>>>>> -      }
> >>>>>>>  }
> >>>>>>>  // Can't use NEON instructions.
> >>>>>>> -    if (UnitSize == 0) {
> >>>>>>> -      ldrOpc = isThumb2 ? ARM::t2LDR_POST :
> ARM::LDR_POST_IMM;
> >>>>>>> -      strOpc = isThumb2 ? ARM::t2STR_POST :
> ARM::STR_POST_IMM;
> >>>>>>> +    if (UnitSize == 0)
> >>>>>>>    UnitSize = 4;
> >>>>>>> -    }
> >>>>>>> }
> >>>>>>>
> >>>>>>> +  StructByvalEmitter ByvalEmitter(UnitSize, Subtarget, TII, MRI,
> >>>>>>> +                                  getDataLayout());
> >>>>>>> unsigned BytesLeft = SizeVal % UnitSize;  unsigned LoopSize =
> >>>>>>> SizeVal - BytesLeft;
> >>>>>>>
> >>>>>>> @@ -7316,67 +7720,22 @@ EmitStructByval(MachineInstr *MI,
> >> Machin
> >>>>>>>  unsigned srcIn = src;
> >>>>>>>  unsigned destIn = dest;
> >>>>>>>  for (unsigned i = 0; i < LoopSize; i+=UnitSize) {
> >>>>>>> -      unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8
?
> >>>>>> TRC_Vec:TRC);
> >>>>>>> -      unsigned srcOut = MRI.createVirtualRegister(TRC);
> >>>>>>> -      unsigned destOut = MRI.createVirtualRegister(TRC);
> >>>>>>> -      if (UnitSize >= 8) {
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> -          TII->get(ldrOpc), scratch)
> >>>>>>> -          .addReg(srcOut,
> > RegState::Define).addReg(srcIn).addImm(0));
> >>>>>>> -
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> >>> destOut)
> >>>>>>> -          .addReg(destIn).addImm(0).addReg(scratch));
> >>>>>>> -      } else if (isThumb2) {
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> -          TII->get(ldrOpc), scratch)
> >>>>>>> -          .addReg(srcOut,
> >>>>> RegState::Define).addReg(srcIn).addImm(UnitSize));
> >>>>>>> -
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> >>> destOut)
> >>>>>>> -          .addReg(scratch).addReg(destIn)
> >>>>>>> -          .addImm(UnitSize));
> >>>>>>> -      } else {
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> -          TII->get(ldrOpc), scratch)
> >>>>>>> -          .addReg(srcOut,
RegState::Define).addReg(srcIn).addReg(0)
> >>>>>>> -          .addImm(UnitSize));
> >>>>>>> -
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> >>> destOut)
> >>>>>>> -          .addReg(scratch).addReg(destIn)
> >>>>>>> -          .addReg(0).addImm(UnitSize));
> >>>>>>> -      }
> >>>>>>> -      srcIn = srcOut;
> >>>>>>> -      destIn = destOut;
> >>>>>>> +      std::pair<unsigned, unsigned> res =
> >>>>>>> +          ByvalEmitter.emitUnitLoad(BB, MI, dl, srcIn);
> >>>>>>> +      unsigned scratch = res.first;
> >>>>>>> +      srcIn = res.second;
> >>>>>>> +      destIn = ByvalEmitter.emitUnitStore(BB, MI, dl, destIn,
> >>>>>>> + scratch);
> >>>>>>>  }
> >>>>>>>
> >>>>>>>  // Handle the leftover bytes with LDRB and STRB.
> >>>>>>>  // [scratch, srcOut] = LDRB_POST(srcIn, 1)  // [destOut] =
> >>>>>>> STRB_POST(scratch, destIn, 1)
> >>>>>>> -    ldrOpc = isThumb2 ? ARM::t2LDRB_POST :
> ARM::LDRB_POST_IMM;
> >>>>>>> -    strOpc = isThumb2 ? ARM::t2STRB_POST :
> ARM::STRB_POST_IMM;
> >>>>>>>  for (unsigned i = 0; i < BytesLeft; i++) {
> >>>>>>> -      unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>>>> -      unsigned srcOut = MRI.createVirtualRegister(TRC);
> >>>>>>> -      unsigned destOut = MRI.createVirtualRegister(TRC);
> >>>>>>> -      if (isThumb2) {
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> -          TII->get(ldrOpc),scratch)
> >>>>>>> -          .addReg(srcOut,
> > RegState::Define).addReg(srcIn).addImm(1));
> >>>>>>> -
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> >>> destOut)
> >>>>>>> -          .addReg(scratch).addReg(destIn)
> >>>>>>> -          .addImm(1));
> >>>>>>> -      } else {
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl,
> >>>>>>> -          TII->get(ldrOpc),scratch)
> >>>>>>> -          .addReg(srcOut, RegState::Define).addReg(srcIn)
> >>>>>>> -          .addReg(0).addImm(1));
> >>>>>>> -
> >>>>>>> -        AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
> >>> destOut)
> >>>>>>> -          .addReg(scratch).addReg(destIn)
> >>>>>>> -          .addReg(0).addImm(1));
> >>>>>>> -      }
> >>>>>>> -      srcIn = srcOut;
> >>>>>>> -      destIn = destOut;
> >>>>>>> +      std::pair<unsigned, unsigned> res =
> >>>>>>> +          ByvalEmitter.emitByteLoad(BB, MI, dl, srcIn);
> >>>>>>> +      unsigned scratch = res.first;
> >>>>>>> +      srcIn = res.second;
> >>>>>>> +      destIn = ByvalEmitter.emitByteStore(BB, MI, dl, destIn,
> >>>>>>> + scratch);
> >>>>>>>  }
> >>>>>>>  MI->eraseFromParent();   // The instruction is gone now.
> >>>>>>>  return BB;
> >>>>>>> @@ -7414,34 +7773,7 @@ EmitStructByval(MachineInstr *MI,
> Machin
> >>>>>>> exitMBB->transferSuccessorsAndUpdatePHIs(BB);
> >>>>>>>
> >>>>>>> // Load an immediate to varEnd.
> >>>>>>> -  unsigned varEnd = MRI.createVirtualRegister(TRC);
> >>>>>>> -  if (isThumb2) {
> >>>>>>> -    unsigned VReg1 = varEnd;
> >>>>>>> -    if ((LoopSize & 0xFFFF0000) != 0)
> >>>>>>> -      VReg1 = MRI.createVirtualRegister(TRC);
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16),
> VReg1)
> >>>>>>> -                   .addImm(LoopSize & 0xFFFF));
> >>>>>>> -
> >>>>>>> -    if ((LoopSize & 0xFFFF0000) != 0)
> >>>>>>> -      AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
> >>> varEnd)
> >>>>>>> -                     .addReg(VReg1)
> >>>>>>> -                     .addImm(LoopSize >> 16));
> >>>>>>> -  } else {
> >>>>>>> -    MachineConstantPool *ConstantPool = MF->getConstantPool();
> >>>>>>> -    Type *Int32Ty =
> >>> Type::getInt32Ty(MF->getFunction()->getContext());
> >>>>>>> -    const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
> >>>>>>> -
> >>>>>>> -    // MachineConstantPool wants an explicit alignment.
> >>>>>>> -    unsigned Align = getDataLayout()-
> >>> getPrefTypeAlignment(Int32Ty);
> >>>>>>> -    if (Align == 0)
> >>>>>>> -      Align = getDataLayout()->getTypeAllocSize(C->getType());
> >>>>>>> -    unsigned Idx = ConstantPool->getConstantPoolIndex(C, Align);
> >>>>>>> -
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::LDRcp))
> >>>>>>> -                   .addReg(varEnd, RegState::Define)
> >>>>>>> -                   .addConstantPoolIndex(Idx)
> >>>>>>> -                   .addImm(0));
> >>>>>>> -  }
> >>>>>>> +  unsigned varEnd = ByvalEmitter.emitConstantLoad(BB, MI, dl,
> >>>>>>> + LoopSize);
> >>>>>>> BB->addSuccessor(loopMBB);
> >>>>>>>
> >>>>>>> // Generate the loop body:
> >>>>>>> @@ -7450,12 +7782,12 @@ EmitStructByval(MachineInstr *MI,
> >> Machin
> >>>>>>> //   destPhi = PHI(destLoop, dst)
> >>>>>>> MachineBasicBlock *entryBB = BB; BB = loopMBB;
> >>>>>>> -  unsigned varLoop = MRI.createVirtualRegister(TRC);
> >>>>>>> -  unsigned varPhi = MRI.createVirtualRegister(TRC);
> >>>>>>> -  unsigned srcLoop = MRI.createVirtualRegister(TRC);
> >>>>>>> -  unsigned srcPhi = MRI.createVirtualRegister(TRC);
> >>>>>>> -  unsigned destLoop = MRI.createVirtualRegister(TRC);
> >>>>>>> -  unsigned destPhi = MRI.createVirtualRegister(TRC);
> >>>>>>> +  unsigned varLoop =
> >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>>> +  unsigned varPhi =
> >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>>> +  unsigned srcLoop =
> >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>>> +  unsigned srcPhi =
> >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>>> +  unsigned destLoop =
> >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>>> +  unsigned destPhi =
> >>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
> >>>>>>>
> >>>>>>> BuildMI(*BB, BB->begin(), dl, TII->get(ARM::PHI), varPhi)
> >>>>>>>  .addReg(varLoop).addMBB(loopMBB) @@ -7469,39 +7801,16 @@
> >>>>>>> EmitStructByval(MachineInstr *MI, Machin
> >>>>>>>
> >>>>>>> //   [scratch, srcLoop] = LDR_POST(srcPhi, UnitSize)
> >>>>>>> //   [destLoop] = STR_POST(scratch, destPhi, UnitSiz)
> >>>>>>> -  unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8 ?
> >>>>>>> TRC_Vec:TRC);
> >>>>>>> -  if (UnitSize >= 8) {
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> >>>>>>> -      .addReg(srcLoop,
> RegState::Define).addReg(srcPhi).addImm(0));
> >>>>>>> -
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> >>>>>>> -      .addReg(destPhi).addImm(0).addReg(scratch));
> >>>>>>> -  } else if (isThumb2) {
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> >>>>>>> -      .addReg(srcLoop,
> >>>>>> RegState::Define).addReg(srcPhi).addImm(UnitSize));
> >>>>>>> -
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> >>>>>>> -      .addReg(scratch).addReg(destPhi)
> >>>>>>> -      .addImm(UnitSize));
> >>>>>>> -  } else {
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
> >>>>>>> -      .addReg(srcLoop, RegState::Define).addReg(srcPhi).addReg(0)
> >>>>>>> -      .addImm(UnitSize));
> >>>>>>> -
> >>>>>>> -    AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
> >>>>>>> -      .addReg(scratch).addReg(destPhi)
> >>>>>>> -      .addReg(0).addImm(UnitSize));
> >>>>>>> +  {
> >>>>>>> +    std::pair<unsigned, unsigned> res =
> >>>>>>> +        ByvalEmitter.emitUnitLoad(BB, BB->end(), dl, srcPhi,
> >>> srcLoop);
> >>>>>>> +    unsigned scratch = res.first;
> >>>>>>> +    ByvalEmitter.emitUnitStore(BB, BB->end(), dl, destPhi,
> >>>>>>> + scratch, destLoop);
> >>>>>>> }
> >>>>>>>
> >>>>>>> // Decrement loop variable by UnitSize.
> >>>>>>> -  MachineInstrBuilder MIB = BuildMI(BB, dl,
> >>>>>>> -    TII->get(isThumb2 ? ARM::t2SUBri : ARM::SUBri), varLoop);
> >>>>>>> -
> >>>>>>>
> >>>>
> AddDefaultCC(AddDefaultPred(MIB.addReg(varPhi).addImm(UnitSize)));
> >>>>>>> -  MIB->getOperand(5).setReg(ARM::CPSR);
> >>>>>>> -  MIB->getOperand(5).setIsDef(true);
> >>>>>>> -
> >>>>>>> -  BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc))
> >>>>>>> -    .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR);
> >>>>>>> +  ByvalEmitter.emitSubImm(BB, BB->end(), dl, varPhi, varLoop);
> >>>>>>> + ByvalEmitter.emitBranchNE(BB, BB->end(), dl, loopMBB);
> >>>>>>>
> >>>>>>> // loopMBB can loop back to loopMBB or fall through to exitMBB.
> >>>>>>> BB->addSuccessor(loopMBB);
> >>>>>>> @@ -7510,36 +7819,17 @@ EmitStructByval(MachineInstr *MI,
> >> Machin
> >>>> //
> >>>>>>> Add epilogue to handle BytesLeft.
> >>>>>>> BB = exitMBB;
> >>>>>>> MachineInstr *StartOfExit = exitMBB->begin();
> >>>>>>> -  ldrOpc = isThumb2 ? ARM::t2LDRB_POST :
> ARM::LDRB_POST_IMM;
> >>>>>>> -  strOpc = isThumb2 ? ARM::t2STRB_POST :
> ARM::STRB_POST_IMM;
> >>>>>>>
> >>>>>>> //   [scratch, srcOut] = LDRB_POST(srcLoop, 1)
> >>>>>>> //   [destOut] = STRB_POST(scratch, destLoop, 1)
> >>>>>>> unsigned srcIn = srcLoop;
> >>>>>>> unsigned destIn = destLoop;
> >>>>>>> for (unsigned i = 0; i < BytesLeft; i++) {
> >>>>>>> -    unsigned scratch = MRI.createVirtualRegister(TRC);
> >>>>>>> -    unsigned srcOut = MRI.createVirtualRegister(TRC);
> >>>>>>> -    unsigned destOut = MRI.createVirtualRegister(TRC);
> >>>>>>> -    if (isThumb2) {
> >>>>>>> -      AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> >>>>>>> -        TII->get(ldrOpc),scratch)
> >>>>>>> -        .addReg(srcOut,
RegState::Define).addReg(srcIn).addImm(1));
> >>>>>>> -
> >>>>>>> -      AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> > TII->get(strOpc),
> >>>>>> destOut)
> >>>>>>> -        .addReg(scratch).addReg(destIn)
> >>>>>>> -        .addImm(1));
> >>>>>>> -    } else {
> >>>>>>> -      AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> >>>>>>> -        TII->get(ldrOpc),scratch)
> >>>>>>> -        .addReg(srcOut,
> >>>>>> RegState::Define).addReg(srcIn).addReg(0).addImm(1));
> >>>>>>> -
> >>>>>>> -      AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> > TII->get(strOpc),
> >>>>>> destOut)
> >>>>>>> -        .addReg(scratch).addReg(destIn)
> >>>>>>> -        .addReg(0).addImm(1));
> >>>>>>> -    }
> >>>>>>> -    srcIn = srcOut;
> >>>>>>> -    destIn = destOut;
> >>>>>>> +    std::pair<unsigned, unsigned> res =
> >>>>>>> +        ByvalEmitter.emitByteLoad(BB, StartOfExit, dl, srcIn);
> >>>>>>> +    unsigned scratch = res.first;
> >>>>>>> +    srcIn = res.second;
> >>>>>>> +    destIn = ByvalEmitter.emitByteStore(BB, StartOfExit, dl,
> >>>>>>> + destIn, scratch);
> >>>>>>> }
> >>>>>>>
> >>>>>>> MI->eraseFromParent();   // The instruction is gone now.
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> llvm-commits mailing list
> >>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>
> >>>>>
> >>>
> >>>
> >
> >

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131023/3f17e6dc/attachment.html>