[llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
Bob Wilson
bob.wilson at apple.com
Tue Oct 22 09:14:48 PDT 2013
Thanks. At least try it out…. If it turns out to be unreadable with all the conditionals, we can reconsider.
On Oct 22, 2013, at 9:13 AM, David Peixotto <dpeixott at codeaurora.org> wrote:
> Ok, I will make a patch to switch to using inline conditionals instead.
>
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
>
>
>> -----Original Message-----
>> From: Bob Wilson [mailto:bob.wilson at apple.com]
>> Sent: Monday, October 21, 2013 9:18 PM
>> To: David Peixotto
>> Cc: llvm-commits at cs.uiuc.edu LLVM
>> Subject: Re: [llvm] r192915 - Refactor lowering for COPY_STRUCT_BYVAL_I32
>>
>>
>> On Oct 21, 2013, at 7:20 PM, David Peixotto <dpeixott at codeaurora.org>
>> wrote:
>>
>>> Hi Bob,
>>>
>>> I agree that a generic emitter would be useful, but I'm not sure I
>>> would get the time to work on such a project at this point.
>>
>> In that case, I think it would be better to simplify this code to use a
> more
>> direct approach with conditionals.
>>
>>>
>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> hosted by The Linux Foundation
>>>
>>>
>>>> -----Original Message-----
>>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
>>>> Sent: Monday, October 21, 2013 12:12 PM
>>>> To: David Peixotto
>>>> Cc: llvm-commits at cs.uiuc.edu
>>>> Subject: Re: [llvm] r192915 - Refactor lowering for
>>>> COPY_STRUCT_BYVAL_I32
>>>>
>>>> It seems like it make more sense to make the StructByvalEmitter a
>>>> generic "emitter" for ARM instructions. I suspect there are a number
>>>> of other
>>> places
>>>> in ARMISelLowering that could make good use of it. Would you be
>>>> willing
>>> to
>>>> investigate that?
>>>>
>>>> As it stands, this really does seem like overkill for the struct
>>>> byval
>>> issue, but if
>>>> you could make it more generally useful, then the extra code would be
>>>> worthwhile.
>>>>
>>>> On Oct 21, 2013, at 10:02 AM, David Peixotto
>>>> <dpeixott at codeaurora.org>
>>>> wrote:
>>>>
>>>>> Hi Bob,
>>>>>
>>>>> I think your criticism is valid here. I wasn't too happy with how
>>>>> much code I ended up writing for this change. When I started I
>>>>> thought the code size would be about equal after implementing the
>>>>> thumb1 lowering because I was getting rid of some code duplication,
>>>>> but the code size for abstracting the common parts was larger than I
> had
>> anticipated.
>>>>> There is no fundamental problem with implementing it with
>>>>> conditionals, I did it this way because I thought it would be
>>>>> clearer
>>> and
>>>> would be easier to write correctly.
>>>>>
>>>>> I think the way it is now has the advantage that the lowering
>>>>> algorithm is clearly separated from the details of generating
>>>>> machine instructions for each sub-target. I think it would be easier
>>>>> to improve the algorithm with the way it is now. For example, we are
>>>>> always using byte stores to copy any leftover that does not fit into
>>>>> the "unit" size. So if we have a 31-byte struct and a target that
>>>>> supports neon we will generate a 16-byte store and
>>>>> 15 1-byte stores. We could improve this by generating fewer stores
>>>>> for the leftover (3x4-byte + 1x2byte + 1x1byte). I don't know if we
>>>>> actually care about this kind of change, but I believe it would be
>>> easier to
>>>> make now.
>>>>>
>>>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
>>>>> Forum, hosted by The Linux Foundation
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bob Wilson [mailto:bob.wilson at apple.com]
>>>>>> Sent: Saturday, October 19, 2013 7:00 PM
>>>>>> To: David Peixotto
>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>> Subject: Re: [llvm] r192915 - Refactor lowering for
>>>>>> COPY_STRUCT_BYVAL_I32
>>>>>>
>>>>>> This is very nice and elegant, but it's an awful lot of code for
>>>>>> something
>>>>> that
>>>>>> isn't really that complicated. It seems like overkill to me. Did
>>>>>> you
>>>>> consider
>>>>>> implementing the Thumb1 support by just adding more conditionals?
>>>>>> Is there a fundamental problem with that?
>>>>>>
>>>>>> On Oct 17, 2013, at 12:49 PM, David Peixotto
>>>>>> <dpeixott at codeaurora.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Author: dpeixott
>>>>>>> Date: Thu Oct 17 14:49:22 2013
>>>>>>> New Revision: 192915
>>>>>>>
>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=192915&view=rev
>>>>>>> Log:
>>>>>>> Refactor lowering for COPY_STRUCT_BYVAL_I32
>>>>>>>
>>>>>>> This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32
>>>>>>> pseudo-instruction in the ARM backend. We introduce a new helper
>>>>>>> class that encapsulates all of the operations needed during the
>>> lowering.
>>>>>>> The operations are implemented for each subtarget in different
>>>>>>> subclasses. Currently only arm and thumb2 subtargets are supported.
>>>>>>>
>>>>>>> This refactoring was done to easily implement support for thumb1
>>>>>>> subtargets. This initial patch does not add support for thumb1,
>>>>>>> but is only a refactoring. A follow on patch will implement the
>>>>>>> support for
>>>>>>> thumb1 subtargets.
>>>>>>>
>>>>>>> No intended functionality change.
>>>>>>>
>>>>>>> Modified:
>>>>>>> llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
>>>>>>>
>>>>>>> Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
>>>>>>> URL:
>>>>>>> http://llvm.org/viewvc/llvm-
>>>> project/llvm/trunk/lib/Target/ARM/ARMISe
>>>>>>> lL owering.cpp?rev=192915&r1=192914&r2=192915&view=diff
>>>>>>>
>>>>>>
>>>>
>> ==========================================================
>>>>>> ============
>>>>>>> ========
>>>>>>> --- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
>>>>>>> +++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Thu Oct 17
>>>>>>> +++ 14:49:22
>>>>>>> +++ 2013
>>>>>>> @@ -48,6 +48,7 @@
>>>>>>> #include "llvm/Support/MathExtras.h"
>>>>>>> #include "llvm/Support/raw_ostream.h"
>>>>>>> #include "llvm/Target/TargetOptions.h"
>>>>>>> +#include <utility>
>>>>>>> using namespace llvm;
>>>>>>>
>>>>>>> STATISTIC(NumTailCalls, "Number of tail calls"); @@ -7245,8
>>>>>>> +7246,430 @@ MachineBasicBlock *OtherSucc(MachineBasi
>>>>>>> llvm_unreachable("Expecting a BB with two successors!"); }
>>>>>>>
>>>>>>> -MachineBasicBlock *ARMTargetLowering::
>>>>>>> -EmitStructByval(MachineInstr *MI, MachineBasicBlock *BB) const {
>>>>>>> +namespace {
>>>>>>> +// This class is a helper for lowering the COPY_STRUCT_BYVAL_I32
>>>>>> instruction.
>>>>>>> +// It defines the operations needed to lower the byval copy. We
>>>>>>> +use a helper // class because the opcodes and machine
>>>>>>> +instructions are different for each // subtarget, but the overall
>>>>>>> +algorithm for the lowering is the same. The // implementation of
>>>>>>> +each operation will be defined separately for arm, thumb1, // and
>>>>>>> +thumb2 targets by subclassing this base class. See //
>>>>> ARMTargetLowering::EmitStructByval()
>>>>>> for how these operations are used.
>>>>>>> +class TargetStructByvalEmitter {
>>>>>>> +public:
>>>>>>> + TargetStructByvalEmitter(const TargetInstrInfo *TII_,
>>>>>>> + MachineRegisterInfo &MRI_,
>>>>>>> + const TargetRegisterClass *TRC_)
>>>>>>> + : TII(TII_), MRI(MRI_), TRC(TRC_) {}
>>>>>>> +
>>>>>>> + // Emit a post-increment load of "unit" size. The unit size is
>>>>>>> + based on the // alignment of the struct being copied (4, 2, or
>>>>>>> + 1 bytes). Alignments higher // than 4 are handled separately by
>>>>>>> + using
>>>>>> NEON instructions.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to load.
>>>>>>> + // \param baseOut the register to recieve the incremented
>> address.
>>>>>>> + // \returns the register holding the loaded value.
>>>>>>> + virtual unsigned emitUnitLoad(MachineBasicBlock *BB,
>>>>>>> + MachineInstr
>>>>>> *MI,
>>>>>>> + DebugLoc &dl, unsigned baseReg,
>>>>>>> + unsigned baseOut) = 0;
>>>>>>> +
>>>>>>> + // Emit a post-increment store of "unit" size. The unit size is
>>>>>>> + based on the // alignment of the struct being copied (4, 2, or
>>>>>>> + 1 bytes). Alignments higher // than 4 are handled separately by
>>>>>>> + using
>>>>>> NEON instructions.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to store.
>>>>>>> + // \param storeReg the register holding the value to store.
>>>>>>> + // \param baseOut the register to recieve the incremented
>> address.
>>>>>>> + virtual void emitUnitStore(MachineBasicBlock *BB, MachineInstr
>> *MI,
>>>>>>> + DebugLoc &dl, unsigned baseReg,
>>>>>>> + unsigned
>>>>> storeReg,
>>>>>>> + unsigned baseOut) = 0;
>>>>>>> +
>>>>>>> + // Emit a post-increment load of one byte.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to load.
>>>>>>> + // \param baseOut the register to recieve the incremented
>> address.
>>>>>>> + // \returns the register holding the loaded value.
>>>>>>> + virtual unsigned emitByteLoad(MachineBasicBlock *BB,
>>>>>>> + MachineInstr
>>>>>> *MI,
>>>>>>> + DebugLoc &dl, unsigned baseReg,
>>>>>>> + unsigned baseOut) = 0;
>>>>>>> +
>>>>>>> + // Emit a post-increment store of one byte.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to store.
>>>>>>> + // \param storeReg the register holding the value to store.
>>>>>>> + // \param baseOut the register to recieve the incremented
>> address.
>>>>>>> + virtual void emitByteStore(MachineBasicBlock *BB, MachineInstr
>> *MI,
>>>>>>> + DebugLoc &dl, unsigned baseReg,
>>>>>>> + unsigned
>>>>> storeReg,
>>>>>>> + unsigned baseOut) = 0;
>>>>>>> +
>>>>>>> + // Emit a load of a constant value.
>>>>>>> + //
>>>>>>> + // \param Constant the register holding the address to store.
>>>>>>> + // \returns the register holding the loaded value.
>>>>>>> + virtual unsigned emitConstantLoad(MachineBasicBlock *BB,
>>>>>> MachineInstr *MI,
>>>>>>> + DebugLoc &dl, unsigned
> Constant,
>>>>>>> + const DataLayout *DL) = 0;
>>>>>>> +
>>>>>>> + // Emit a subtract of a register minus immediate, with the
>>>>>>> + immediate equal to // the "unit" size. The unit size is based
>>>>>>> + on the alignment of the struct // being copied (16, 8, 4, 2, or
>>>>>>> + 1
>>>>> bytes).
>>>>>>> + //
>>>>>>> + // \param InReg the register holding the initial value.
>>>>>>> + // \param OutReg the register to recieve the subtracted value.
>>>>>>> + virtual void emitSubImm(MachineBasicBlock *BB, MachineInstr
>>>>>>> + *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned InReg, unsigned OutReg) = 0;
>>>>>>> +
>>>>>>> + // Emit a branch based on a condition code of not equal.
>>>>>>> + //
>>>>>>> + // \param TargetBB the destination of the branch.
>>>>>>> + virtual void emitBranchNE(MachineBasicBlock *BB, MachineInstr
>> *MI,
>>>>>>> + DebugLoc &dl, MachineBasicBlock
>>>>>>> + *TargetBB) = 0;
>>>>>>> +
>>>>>>> + // Find the constant pool index for the given constant. This
>>>>>>> + method is // implemented in the base class because it is the
>>>>>>> + same for all
>>>>>> subtargets.
>>>>>>> + //
>>>>>>> + // \param LoopSize the constant value for which the index
>>>>>>> + should be
>>>>>> returned.
>>>>>>> + // \returns the constant pool index for the constant.
>>>>>>> + unsigned getConstantPoolIndex(MachineFunction *MF, const
>>>>>> DataLayout *DL,
>>>>>>> + unsigned LoopSize) {
>>>>>>> + MachineConstantPool *ConstantPool = MF->getConstantPool();
>>>>>>> + Type *Int32Ty = Type::getInt32Ty(MF->getFunction()-
>>>>> getContext());
>>>>>>> + const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
>>>>>>> +
>>>>>>> + // MachineConstantPool wants an explicit alignment.
>>>>>>> + unsigned Align = DL->getPrefTypeAlignment(Int32Ty);
>>>>>>> + if (Align == 0)
>>>>>>> + Align = DL->getTypeAllocSize(C->getType());
>>>>>>> + return ConstantPool->getConstantPoolIndex(C, Align); }
>>>>>>> +
>>>>>>> + // Return the register class used by the subtarget.
>>>>>>> + //
>>>>>>> + // \returns the target register class.
>>>>>>> + const TargetRegisterClass *getTRC() const { return TRC; }
>>>>>>> +
>>>>>>> + virtual ~TargetStructByvalEmitter() {};
>>>>>>> +
>>>>>>> +protected:
>>>>>>> + const TargetInstrInfo *TII;
>>>>>>> + MachineRegisterInfo &MRI;
>>>>>>> + const TargetRegisterClass *TRC; };
>>>>>>> +
>>>>>>> +class ARMStructByvalEmitter : public TargetStructByvalEmitter {
>>>>>>> +public:
>>>>>>> + ARMStructByvalEmitter(const TargetInstrInfo *TII,
>>>>>>> +MachineRegisterInfo
>>>>>> &MRI,
>>>>>>> + unsigned LoadStoreSize)
>>>>>>> + : TargetStructByvalEmitter(
>>>>>>> + TII, MRI, (const TargetRegisterClass
>>> *)&ARM::GPRRegClass),
>>>>>>> + UnitSize(LoadStoreSize),
>>>>>>> + UnitLdOpc(LoadStoreSize == 4
>>>>>>> + ? ARM::LDR_POST_IMM
>>>>>>> + : LoadStoreSize == 2
>>>>>>> + ? ARM::LDRH_POST
>>>>>>> + : LoadStoreSize == 1 ?
>>>>>>> + ARM::LDRB_POST_IMM
>>> :
>>>>> 0),
>>>>>>> + UnitStOpc(LoadStoreSize == 4
>>>>>>> + ? ARM::STR_POST_IMM
>>>>>>> + : LoadStoreSize == 2
>>>>>>> + ? ARM::STRH_POST
>>>>>>> + : LoadStoreSize == 1 ?
>>>>>>> +ARM::STRB_POST_IMM
>>>>>>> +: 0) {}
>>>>>>> +
>>>>>>> + unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned baseOut) {
>>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
>>>>>> scratch).addReg(
>>>>>>> + baseOut,
>>>>>> RegState::Define).addReg(baseReg).addReg(0).addImm(UnitSize));
>>>>>>> + return scratch;
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned storeReg,
>>>>>>> + unsigned
>>>>> baseOut) {
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
>>>>>> baseOut).addReg(
>>>>>>> + storeReg).addReg(baseReg).addReg(0).addImm(UnitSize));
>>>>>>> + }
>>>>>>> +
>>>>>>> + unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned baseOut) {
>>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> + TII->get(ARM::LDRB_POST_IMM),
>>>>>> scratch)
>>>>>>> + .addReg(baseOut,
>>>>> RegState::Define).addReg(baseReg)
>>>>>>> + .addReg(0).addImm(1));
>>>>>>> + return scratch;
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned storeReg,
>>>>>>> + unsigned
>>>>> baseOut) {
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> + TII->get(ARM::STRB_POST_IMM),
>>>>>> baseOut)
>>>>>>> +
>>>>>>> + .addReg(storeReg).addReg(baseReg).addReg(0).addImm(1));
>>>>>>> + }
>>>>>>> +
>>>>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB, MachineInstr
>>>> *MI,
>>>>>>> + DebugLoc &dl, unsigned Constant,
>>>>>>> + const DataLayout *DL) {
>>>>>>> + unsigned constReg = MRI.createVirtualRegister(TRC);
>>>>>>> + unsigned Idx = getConstantPoolIndex(BB->getParent(), DL,
>>>> Constant);
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII-
>>> get(ARM::LDRcp)).addReg(
>>>>>>> + constReg,
>>>>> RegState::Define).addConstantPoolIndex(Idx).addImm(0));
>>>>>>> + return constReg;
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
>>>> DebugLoc
>>>>>> &dl,
>>>>>>> + unsigned InReg, unsigned OutReg) {
>>>>>>> + MachineInstrBuilder MIB =
>>>>>>> + BuildMI(*BB, MI, dl, TII->get(ARM::SUBri), OutReg);
>>>>>>> +
>>>>>>
>> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
>>>>>>> + MIB->getOperand(5).setReg(ARM::CPSR);
>>>>>>> + MIB->getOperand(5).setIsDef(true);
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + MachineBasicBlock *TargetBB) {
>>>>>>> + BuildMI(*BB, MI, dl, TII-
>>>>>>> get(ARM::Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
>>>>>>> + .addReg(ARM::CPSR);
>>>>>>> + }
>>>>>>> +
>>>>>>> +private:
>>>>>>> + const unsigned UnitSize;
>>>>>>> + const unsigned UnitLdOpc;
>>>>>>> + const unsigned UnitStOpc;
>>>>>>> +};
>>>>>>> +
>>>>>>> +class Thumb2StructByvalEmitter : public TargetStructByvalEmitter
>>>>>>> +{
>>>>>>> +public:
>>>>>>> + Thumb2StructByvalEmitter(const TargetInstrInfo *TII,
>>>>>> MachineRegisterInfo &MRI,
>>>>>>> + unsigned LoadStoreSize)
>>>>>>> + : TargetStructByvalEmitter(
>>>>>>> + TII, MRI, (const TargetRegisterClass
>>> *)&ARM::tGPRRegClass),
>>>>>>> + UnitSize(LoadStoreSize),
>>>>>>> + UnitLdOpc(LoadStoreSize == 4
>>>>>>> + ? ARM::t2LDR_POST
>>>>>>> + : LoadStoreSize == 2
>>>>>>> + ? ARM::t2LDRH_POST
>>>>>>> + : LoadStoreSize == 1 ? ARM::t2LDRB_POST
> :
>>>>> 0),
>>>>>>> + UnitStOpc(LoadStoreSize == 4
>>>>>>> + ? ARM::t2STR_POST
>>>>>>> + : LoadStoreSize == 2
>>>>>>> + ? ARM::t2STRH_POST
>>>>>>> + : LoadStoreSize == 1 ? ARM::t2STRB_POST
> :
>>>>>>> +0) {}
>>>>>>> +
>>>>>>> + unsigned emitUnitLoad(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned baseOut) {
>>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitLdOpc),
>>>>>> scratch).addReg(
>>>>>>> + baseOut,
>> RegState::Define).addReg(baseReg).addImm(UnitSize));
>>>>>>> + return scratch;
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned storeReg,
>>>>>>> + unsigned
>>>>> baseOut) {
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(UnitStOpc),
>>>>>>> + baseOut)
>>>>>>> +
>>>>>>> + .addReg(storeReg).addReg(baseReg).addImm(UnitSize));
>>>>>>> + }
>>>>>>> +
>>>>>>> + unsigned emitByteLoad(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned baseOut) {
>>>>>>> + unsigned scratch = MRI.createVirtualRegister(TRC);
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> + TII->get(ARM::t2LDRB_POST),
>>>>>> scratch)
>>>>>>> + .addReg(baseOut,
>>>>> RegState::Define).addReg(baseReg)
>>>>>>> + .addImm(1));
>>>>>>> + return scratch;
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitByteStore(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned storeReg,
>>>>>>> + unsigned
>>>>> baseOut) {
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> + TII->get(ARM::t2STRB_POST),
>>>>>> baseOut)
>>>>>>> +
>>>>>>> + .addReg(storeReg).addReg(baseReg).addImm(1));
>>>>>>> + }
>>>>>>> +
>>>>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB, MachineInstr
>>>> *MI,
>>>>>>> + DebugLoc &dl, unsigned Constant,
>>>>>>> + const DataLayout *DL) {
>>>>>>> + unsigned VConst = MRI.createVirtualRegister(TRC);
>>>>>>> + unsigned Vtmp = VConst;
>>>>>>> + if ((Constant & 0xFFFF0000) != 0)
>>>>>>> + Vtmp = MRI.createVirtualRegister(TRC);
>>>>>>> + AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16), Vtmp)
>>>>>>> + .addImm(Constant & 0xFFFF));
>>>>>>> +
>>>>>>> + if ((Constant & 0xFFFF0000) != 0)
>>>>>>> + AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
>>> VConst)
>>>>>>> + .addReg(Vtmp).addImm(Constant >> 16));
>>>>>>> + return VConst;
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
>>>> DebugLoc
>>>>>> &dl,
>>>>>>> + unsigned InReg, unsigned OutReg) {
>>>>>>> + MachineInstrBuilder MIB =
>>>>>>> + BuildMI(*BB, MI, dl, TII->get(ARM::t2SUBri), OutReg);
>>>>>>> +
>>>>>>
>> AddDefaultCC(AddDefaultPred(MIB.addReg(InReg).addImm(UnitSize)));
>>>>>>> + MIB->getOperand(5).setReg(ARM::CPSR);
>>>>>>> + MIB->getOperand(5).setIsDef(true);
>>>>>>> + }
>>>>>>> +
>>>>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + MachineBasicBlock *TargetBB) {
>>>>>>> + BuildMI(BB, dl, TII-
>>>>>>> get(ARM::t2Bcc)).addMBB(TargetBB).addImm(ARMCC::NE)
>>>>>>> + .addReg(ARM::CPSR);
>>>>>>> + }
>>>>>>> +
>>>>>>> +private:
>>>>>>> + const unsigned UnitSize;
>>>>>>> + const unsigned UnitLdOpc;
>>>>>>> + const unsigned UnitStOpc;
>>>>>>> +};
>>>>>>> +
>>>>>>> +// This class is a thin wrapper that delegates most of the work
>>>>>>> +to the correct // TargetStructByvalEmitter implementation. It
>>>>>>> +also handles the lowering for // targets that support neon
>>>>>>> +because the neon implementation is the same for all // targets that
>> support it.
>>>>>>> +class StructByvalEmitter {
>>>>>>> +public:
>>>>>>> + StructByvalEmitter(unsigned LoadStoreSize, const ARMSubtarget
>>>>>> *Subtarget,
>>>>>>> + const TargetInstrInfo *TII_,
>>>>>>> + MachineRegisterInfo
>>>>> &MRI_,
>>>>>>> + const DataLayout *DL_)
>>>>>>> + : UnitSize(LoadStoreSize),
>>>>>>> + TargetEmitter(
>>>>>>> + Subtarget->isThumb2()
>>>>>>> + ? static_cast<TargetStructByvalEmitter *>(
>>>>>>> + new Thumb2StructByvalEmitter(TII_, MRI_,
>>>>>>> + LoadStoreSize))
>>>>>>> + : static_cast<TargetStructByvalEmitter *>(
>>>>>>> + new ARMStructByvalEmitter(TII_, MRI_,
>>>>>>> + LoadStoreSize))),
>>>>>>> + TII(TII_), MRI(MRI_), DL(DL_),
>>>>>>> + VecTRC(UnitSize == 16
>>>>>>> + ? (const TargetRegisterClass
> *)&ARM::DPairRegClass
>>>>>>> + : UnitSize == 8
>>>>>>> + ? (const TargetRegisterClass
>>>>> *)&ARM::DPRRegClass
>>>>>>> + : 0),
>>>>>>> + VecLdOpc(UnitSize == 16 ? ARM::VLD1q32wb_fixed
>>>>>>> + : UnitSize == 8 ?
>>>>>>> + ARM::VLD1d32wb_fixed
>>>>> : 0),
>>>>>>> + VecStOpc(UnitSize == 16 ? ARM::VST1q32wb_fixed
>>>>>>> + : UnitSize == 8 ?
>>>>>>> +ARM::VST1d32wb_fixed : 0) {}
>>>>>>> +
>>>>>>> + // Emit a post-increment load of "unit" size. The unit size is
>>>>>>> + based on the // alignment of the struct being copied (16, 8, 4,
>>>>>>> + 2, or 1 bytes). Loads of 16 // or 8 bytes use NEON instructions
>>>>>>> + to load
>>>>> the
>>>>>> value.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to load.
>>>>>>> + // \param baseOut the register to recieve the incremented
>> address.
>>>>>>> + If baseOut // is 0 then a new register is created to hold the
>>>>> incremented
>>>>>> address.
>>>>>>> + // \returns a pair of registers holding the loaded value and
>>>>>>> + the updated // address.
>>>>>>> + std::pair<unsigned, unsigned> emitUnitLoad(MachineBasicBlock
>> *BB,
>>>>>>> + MachineInstr *MI,
>>>>>>> + DebugLoc
>>>>> &dl,
>>>>>>> + unsigned baseReg,
>>>>>>> + unsigned baseOut = 0)
> {
>>>>>>> + unsigned scratch = 0;
>>>>>>> + if (baseOut == 0)
>>>>>>> + baseOut = MRI.createVirtualRegister(TargetEmitter->getTRC());
>>>>>>> + if (UnitSize >= 8) { // neon
>>>>>>> + scratch = MRI.createVirtualRegister(VecTRC);
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecLdOpc),
>>>>>> scratch).addReg(
>>>>>>> + baseOut, RegState::Define).addReg(baseReg).addImm(0));
>>>>>>> + } else {
>>>>>>> + scratch = TargetEmitter->emitUnitLoad(BB, MI, dl, baseReg,
>>>>> baseOut);
>>>>>>> + }
>>>>>>> + return std::make_pair(scratch, baseOut); }
>>>>>>> +
>>>>>>> + // Emit a post-increment store of "unit" size. The unit size is
>>>>>>> + based on the // alignment of the struct being copied (16, 8, 4,
>>>>>>> + 2, or 1 bytes). Stores of // 16 or 8 bytes use NEON
>>>>>>> + instructions to
>>>>> store the
>>>>>> value.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to store.
>>>>>>> + // \param storeReg the register holding the value to store.
>>>>>>> + // \param baseOut the register to recieve the incremented
>> address.
>>>>>>> + If baseOut // is 0 then a new register is created to hold the
>>>>> incremented
>>>>>> address.
>>>>>>> + // \returns the register holding the updated address.
>>>>>>> + unsigned emitUnitStore(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned storeReg,
>>>>>>> + unsigned baseOut = 0) {
>>>>>>> + if (baseOut == 0)
>>>>>>> + baseOut = MRI.createVirtualRegister(TargetEmitter->getTRC());
>>>>>>> + if (UnitSize >= 8) { // neon
>>>>>>> + AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(VecStOpc),
>>> baseOut)
>>>>>>> +
>>> .addReg(baseReg).addImm(0).addReg(storeReg));
>>>>>>> + } else {
>>>>>>> + TargetEmitter->emitUnitStore(BB, MI, dl, baseReg, storeReg,
>>>>>> baseOut);
>>>>>>> + }
>>>>>>> + return baseOut;
>>>>>>> + }
>>>>>>> +
>>>>>>> + // Emit a post-increment load of one byte.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to load.
>>>>>>> + // \returns a pair of registers holding the loaded value and
>>>>>>> + the updated // address.
>>>>>>> + std::pair<unsigned, unsigned> emitByteLoad(MachineBasicBlock
>> *BB,
>>>>>>> + MachineInstr *MI,
>>>>>>> + DebugLoc
>>>>> &dl,
>>>>>>> + unsigned baseReg) {
>>>>>>> + unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
>>>>>>> getTRC());
>>>>>>> + unsigned scratch =
>>>>>>> + TargetEmitter->emitByteLoad(BB, MI, dl, baseReg, baseOut);
>>>>>>> + return std::make_pair(scratch, baseOut); }
>>>>>>> +
>>>>>>> + // Emit a post-increment store of one byte.
>>>>>>> + //
>>>>>>> + // \param baseReg the register holding the address to store.
>>>>>>> + // \param storeReg the register holding the value to store.
>>>>>>> + // \returns the register holding the updated address.
>>>>>>> + unsigned emitByteStore(MachineBasicBlock *BB, MachineInstr
>> *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + unsigned baseReg, unsigned storeReg) {
>>>>>>> + unsigned baseOut = MRI.createVirtualRegister(TargetEmitter-
>>>>>>> getTRC());
>>>>>>> + TargetEmitter->emitByteStore(BB, MI, dl, baseReg, storeReg,
>>>>> baseOut);
>>>>>>> + return baseOut;
>>>>>>> + }
>>>>>>> +
>>>>>>> + // Emit a load of the constant LoopSize.
>>>>>>> + //
>>>>>>> + // \param LoopSize the constant to load.
>>>>>>> + // \returns the register holding the loaded constant.
>>>>>>> + unsigned emitConstantLoad(MachineBasicBlock *BB, MachineInstr
>>>> *MI,
>>>>>>> + DebugLoc &dl, unsigned LoopSize) {
>>>>>>> + return TargetEmitter->emitConstantLoad(BB, MI, dl, LoopSize,
>>>>>>> + DL); }
>>>>>>> +
>>>>>>> + // Emit a subtract of a register minus immediate, with the
>>>>>>> + immediate equal to // the "unit" size. The unit size is based
>>>>>>> + on the alignment of the struct // being copied (16, 8, 4, 2, or
>>>>>>> + 1
>>>>> bytes).
>>>>>>> + //
>>>>>>> + // \param InReg the register holding the initial value.
>>>>>>> + // \param OutReg the register to recieve the subtracted value.
>>>>>>> + void emitSubImm(MachineBasicBlock *BB, MachineInstr *MI,
>>>> DebugLoc
>>>>>> &dl,
>>>>>>> + unsigned InReg, unsigned OutReg) {
>>>>>>> + TargetEmitter->emitSubImm(BB, MI, dl, InReg, OutReg); }
>>>>>>> +
>>>>>>> + // Emit a branch based on a condition code of not equal.
>>>>>>> + //
>>>>>>> + // \param TargetBB the destination of the branch.
>>>>>>> + void emitBranchNE(MachineBasicBlock *BB, MachineInstr *MI,
>>>>>> DebugLoc &dl,
>>>>>>> + MachineBasicBlock *TargetBB) {
>>>>>>> + TargetEmitter->emitBranchNE(BB, MI, dl, TargetBB); }
>>>>>>> +
>>>>>>> + // Return the register class used by the subtarget.
>>>>>>> + //
>>>>>>> + // \returns the target register class.
>>>>>>> + const TargetRegisterClass *getTRC() const { return
>>>>>>> + TargetEmitter->getTRC(); }
>>>>>>> +
>>>>>>> +private:
>>>>>>> + const unsigned UnitSize;
>>>>>>> + OwningPtr<TargetStructByvalEmitter> TargetEmitter;
>>>>>>> + const TargetInstrInfo *TII;
>>>>>>> + MachineRegisterInfo &MRI;
>>>>>>> + const DataLayout *DL;
>>>>>>> +
>>>>>>> + const TargetRegisterClass *VecTRC;
>>>>>>> + const unsigned VecLdOpc;
>>>>>>> + const unsigned VecStOpc;
>>>>>>> +};
>>>>>>> +}
>>>>>>> +
>>>>>>> +MachineBasicBlock *
>>>>>>> +ARMTargetLowering::EmitStructByval(MachineInstr *MI,
>>>>>>> + MachineBasicBlock *BB) const {
>>>>>>> // This pseudo instruction has 3 operands: dst, src, size // We
>>>>>>> expand it to a loop if size > Subtarget-
>>>>>>> getMaxInlineSizeThreshold().
>>>>>>> // Otherwise, we will generate unrolled scalar copies.
>>>>>>> @@ -7261,23 +7684,13 @@ EmitStructByval(MachineInstr *MI,
>> Machin
>>>>>>> unsigned Align = MI->getOperand(3).getImm(); DebugLoc dl =
>>>>>>> MI->getDebugLoc();
>>>>>>>
>>>>>>> - bool isThumb2 = Subtarget->isThumb2(); MachineFunction *MF =
>>>>>>> BB->getParent(); MachineRegisterInfo &MRI = MF->getRegInfo();
>>>>>>> - unsigned ldrOpc, strOpc, UnitSize = 0;
>>>>>>> -
>>>>>>> - const TargetRegisterClass *TRC = isThumb2 ?
>>>>>>> - (const TargetRegisterClass*)&ARM::tGPRRegClass :
>>>>>>> - (const TargetRegisterClass*)&ARM::GPRRegClass;
>>>>>>> - const TargetRegisterClass *TRC_Vec = 0;
>>>>>>> + unsigned UnitSize = 0;
>>>>>>>
>>>>>>> if (Align & 1) {
>>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST : ARM::LDRB_POST_IMM;
>>>>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST : ARM::STRB_POST_IMM;
>>>>>>> UnitSize = 1;
>>>>>>> } else if (Align & 2) {
>>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRH_POST : ARM::LDRH_POST;
>>>>>>> - strOpc = isThumb2 ? ARM::t2STRH_POST : ARM::STRH_POST;
>>>>>>> UnitSize = 2;
>>>>>>> } else {
>>>>>>> // Check whether we can use NEON instructions.
>>>>>>> @@ -7285,27 +7698,18 @@ EmitStructByval(MachineInstr *MI,
>> Machin
>>>>>>> hasAttribute(AttributeSet::FunctionIndex,
>>>>>>> Attribute::NoImplicitFloat) &&
>>>>>>> Subtarget->hasNEON()) {
>>>>>>> - if ((Align % 16 == 0) && SizeVal >= 16) {
>>>>>>> - ldrOpc = ARM::VLD1q32wb_fixed;
>>>>>>> - strOpc = ARM::VST1q32wb_fixed;
>>>>>>> + if ((Align % 16 == 0) && SizeVal >= 16)
>>>>>>> UnitSize = 16;
>>>>>>> - TRC_Vec = (const TargetRegisterClass*)&ARM::DPairRegClass;
>>>>>>> - }
>>>>>>> - else if ((Align % 8 == 0) && SizeVal >= 8) {
>>>>>>> - ldrOpc = ARM::VLD1d32wb_fixed;
>>>>>>> - strOpc = ARM::VST1d32wb_fixed;
>>>>>>> + else if ((Align % 8 == 0) && SizeVal >= 8)
>>>>>>> UnitSize = 8;
>>>>>>> - TRC_Vec = (const TargetRegisterClass*)&ARM::DPRRegClass;
>>>>>>> - }
>>>>>>> }
>>>>>>> // Can't use NEON instructions.
>>>>>>> - if (UnitSize == 0) {
>>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDR_POST : ARM::LDR_POST_IMM;
>>>>>>> - strOpc = isThumb2 ? ARM::t2STR_POST : ARM::STR_POST_IMM;
>>>>>>> + if (UnitSize == 0)
>>>>>>> UnitSize = 4;
>>>>>>> - }
>>>>>>> }
>>>>>>>
>>>>>>> + StructByvalEmitter ByvalEmitter(UnitSize, Subtarget, TII, MRI,
>>>>>>> + getDataLayout());
>>>>>>> unsigned BytesLeft = SizeVal % UnitSize; unsigned LoopSize =
>>>>>>> SizeVal - BytesLeft;
>>>>>>>
>>>>>>> @@ -7316,67 +7720,22 @@ EmitStructByval(MachineInstr *MI,
>> Machin
>>>>>>> unsigned srcIn = src;
>>>>>>> unsigned destIn = dest;
>>>>>>> for (unsigned i = 0; i < LoopSize; i+=UnitSize) {
>>>>>>> - unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8 ?
>>>>>> TRC_Vec:TRC);
>>>>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
>>>>>>> - if (UnitSize >= 8) {
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> - TII->get(ldrOpc), scratch)
>>>>>>> - .addReg(srcOut,
> RegState::Define).addReg(srcIn).addImm(0));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
>>> destOut)
>>>>>>> - .addReg(destIn).addImm(0).addReg(scratch));
>>>>>>> - } else if (isThumb2) {
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> - TII->get(ldrOpc), scratch)
>>>>>>> - .addReg(srcOut,
>>>>> RegState::Define).addReg(srcIn).addImm(UnitSize));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
>>> destOut)
>>>>>>> - .addReg(scratch).addReg(destIn)
>>>>>>> - .addImm(UnitSize));
>>>>>>> - } else {
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> - TII->get(ldrOpc), scratch)
>>>>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn).addReg(0)
>>>>>>> - .addImm(UnitSize));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
>>> destOut)
>>>>>>> - .addReg(scratch).addReg(destIn)
>>>>>>> - .addReg(0).addImm(UnitSize));
>>>>>>> - }
>>>>>>> - srcIn = srcOut;
>>>>>>> - destIn = destOut;
>>>>>>> + std::pair<unsigned, unsigned> res =
>>>>>>> + ByvalEmitter.emitUnitLoad(BB, MI, dl, srcIn);
>>>>>>> + unsigned scratch = res.first;
>>>>>>> + srcIn = res.second;
>>>>>>> + destIn = ByvalEmitter.emitUnitStore(BB, MI, dl, destIn,
>>>>>>> + scratch);
>>>>>>> }
>>>>>>>
>>>>>>> // Handle the leftover bytes with LDRB and STRB.
>>>>>>> // [scratch, srcOut] = LDRB_POST(srcIn, 1)
>>>>>>> // [destOut] = STRB_POST(scratch, destIn, 1)
>>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST : ARM::LDRB_POST_IMM;
>>>>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST : ARM::STRB_POST_IMM;
>>>>>>> for (unsigned i = 0; i < BytesLeft; i++) {
>>>>>>> - unsigned scratch = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
>>>>>>> - if (isThumb2) {
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> - TII->get(ldrOpc),scratch)
>>>>>>> - .addReg(srcOut,
> RegState::Define).addReg(srcIn).addImm(1));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
>>> destOut)
>>>>>>> - .addReg(scratch).addReg(destIn)
>>>>>>> - .addImm(1));
>>>>>>> - } else {
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl,
>>>>>>> - TII->get(ldrOpc),scratch)
>>>>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn)
>>>>>>> - .addReg(0).addImm(1));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, MI, dl, TII->get(strOpc),
>>> destOut)
>>>>>>> - .addReg(scratch).addReg(destIn)
>>>>>>> - .addReg(0).addImm(1));
>>>>>>> - }
>>>>>>> - srcIn = srcOut;
>>>>>>> - destIn = destOut;
>>>>>>> + std::pair<unsigned, unsigned> res =
>>>>>>> + ByvalEmitter.emitByteLoad(BB, MI, dl, srcIn);
>>>>>>> + unsigned scratch = res.first;
>>>>>>> + srcIn = res.second;
>>>>>>> + destIn = ByvalEmitter.emitByteStore(BB, MI, dl, destIn,
>>>>>>> + scratch);
>>>>>>> }
>>>>>>> MI->eraseFromParent(); // The instruction is gone now.
>>>>>>> return BB;
>>>>>>> @@ -7414,34 +7773,7 @@ EmitStructByval(MachineInstr *MI, Machin
>>>>>>> exitMBB->transferSuccessorsAndUpdatePHIs(BB);
>>>>>>>
>>>>>>> // Load an immediate to varEnd.
>>>>>>> - unsigned varEnd = MRI.createVirtualRegister(TRC);
>>>>>>> - if (isThumb2) {
>>>>>>> - unsigned VReg1 = varEnd;
>>>>>>> - if ((LoopSize & 0xFFFF0000) != 0)
>>>>>>> - VReg1 = MRI.createVirtualRegister(TRC);
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVi16), VReg1)
>>>>>>> - .addImm(LoopSize & 0xFFFF));
>>>>>>> -
>>>>>>> - if ((LoopSize & 0xFFFF0000) != 0)
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2MOVTi16),
>>> varEnd)
>>>>>>> - .addReg(VReg1)
>>>>>>> - .addImm(LoopSize >> 16));
>>>>>>> - } else {
>>>>>>> - MachineConstantPool *ConstantPool = MF->getConstantPool();
>>>>>>> - Type *Int32Ty =
>>> Type::getInt32Ty(MF->getFunction()->getContext());
>>>>>>> - const Constant *C = ConstantInt::get(Int32Ty, LoopSize);
>>>>>>> -
>>>>>>> - // MachineConstantPool wants an explicit alignment.
>>>>>>> - unsigned Align = getDataLayout()-
>>> getPrefTypeAlignment(Int32Ty);
>>>>>>> - if (Align == 0)
>>>>>>> - Align = getDataLayout()->getTypeAllocSize(C->getType());
>>>>>>> - unsigned Idx = ConstantPool->getConstantPoolIndex(C, Align);
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::LDRcp))
>>>>>>> - .addReg(varEnd, RegState::Define)
>>>>>>> - .addConstantPoolIndex(Idx)
>>>>>>> - .addImm(0));
>>>>>>> - }
>>>>>>> + unsigned varEnd = ByvalEmitter.emitConstantLoad(BB, MI, dl,
>>>>>>> + LoopSize);
>>>>>>> BB->addSuccessor(loopMBB);
>>>>>>>
>>>>>>> // Generate the loop body:
>>>>>>> @@ -7450,12 +7782,12 @@ EmitStructByval(MachineInstr *MI,
>> Machin
>>>>>>> // destPhi = PHI(destLoop, dst)
>>>>>>> MachineBasicBlock *entryBB = BB;
>>>>>>> BB = loopMBB;
>>>>>>> - unsigned varLoop = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned varPhi = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned srcLoop = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned srcPhi = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned destLoop = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned destPhi = MRI.createVirtualRegister(TRC);
>>>>>>> + unsigned varLoop =
>>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
>>>>>>> + unsigned varPhi =
>>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
>>>>>>> + unsigned srcLoop =
>>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
>>>>>>> + unsigned srcPhi =
>>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
>>>>>>> + unsigned destLoop =
>>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
>>>>>>> + unsigned destPhi =
>>>>>>> + MRI.createVirtualRegister(ByvalEmitter.getTRC());
>>>>>>>
>>>>>>> BuildMI(*BB, BB->begin(), dl, TII->get(ARM::PHI), varPhi)
>>>>>>> .addReg(varLoop).addMBB(loopMBB) @@ -7469,39 +7801,16 @@
>>>>>>> EmitStructByval(MachineInstr *MI, Machin
>>>>>>>
>>>>>>> // [scratch, srcLoop] = LDR_POST(srcPhi, UnitSize)
>>>>>>> // [destLoop] = STR_POST(scratch, destPhi, UnitSiz)
>>>>>>> - unsigned scratch = MRI.createVirtualRegister(UnitSize >= 8 ?
>>>>>>> TRC_Vec:TRC);
>>>>>>> - if (UnitSize >= 8) {
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
>>>>>>> - .addReg(srcLoop, RegState::Define).addReg(srcPhi).addImm(0));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
>>>>>>> - .addReg(destPhi).addImm(0).addReg(scratch));
>>>>>>> - } else if (isThumb2) {
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
>>>>>>> - .addReg(srcLoop,
>>>>>> RegState::Define).addReg(srcPhi).addImm(UnitSize));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
>>>>>>> - .addReg(scratch).addReg(destPhi)
>>>>>>> - .addImm(UnitSize));
>>>>>>> - } else {
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc), scratch)
>>>>>>> - .addReg(srcLoop, RegState::Define).addReg(srcPhi).addReg(0)
>>>>>>> - .addImm(UnitSize));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), destLoop)
>>>>>>> - .addReg(scratch).addReg(destPhi)
>>>>>>> - .addReg(0).addImm(UnitSize));
>>>>>>> + {
>>>>>>> + std::pair<unsigned, unsigned> res =
>>>>>>> + ByvalEmitter.emitUnitLoad(BB, BB->end(), dl, srcPhi,
>>> srcLoop);
>>>>>>> + unsigned scratch = res.first;
>>>>>>> + ByvalEmitter.emitUnitStore(BB, BB->end(), dl, destPhi,
>>>>>>> + scratch, destLoop);
>>>>>>> }
>>>>>>>
>>>>>>> // Decrement loop variable by UnitSize.
>>>>>>> - MachineInstrBuilder MIB = BuildMI(BB, dl,
>>>>>>> - TII->get(isThumb2 ? ARM::t2SUBri : ARM::SUBri), varLoop);
>>>>>>> -
>>>>>>>
>>>> AddDefaultCC(AddDefaultPred(MIB.addReg(varPhi).addImm(UnitSize)));
>>>>>>> - MIB->getOperand(5).setReg(ARM::CPSR);
>>>>>>> - MIB->getOperand(5).setIsDef(true);
>>>>>>> -
>>>>>>> - BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc))
>>>>>>> - .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR);
>>>>>>> + ByvalEmitter.emitSubImm(BB, BB->end(), dl, varPhi, varLoop);
>>>>>>> + ByvalEmitter.emitBranchNE(BB, BB->end(), dl, loopMBB);
>>>>>>>
>>>>>>> // loopMBB can loop back to loopMBB or fall through to exitMBB.
>>>>>>> BB->addSuccessor(loopMBB);
>>>>>>> @@ -7510,36 +7819,17 @@ EmitStructByval(MachineInstr *MI,
>> Machin
>>>> //
>>>>>>> Add epilogue to handle BytesLeft.
>>>>>>> BB = exitMBB;
>>>>>>> MachineInstr *StartOfExit = exitMBB->begin();
>>>>>>> - ldrOpc = isThumb2 ? ARM::t2LDRB_POST : ARM::LDRB_POST_IMM;
>>>>>>> - strOpc = isThumb2 ? ARM::t2STRB_POST : ARM::STRB_POST_IMM;
>>>>>>>
>>>>>>> // [scratch, srcOut] = LDRB_POST(srcLoop, 1)
>>>>>>> // [destOut] = STRB_POST(scratch, destLoop, 1)
>>>>>>> unsigned srcIn = srcLoop;
>>>>>>> unsigned destIn = destLoop;
>>>>>>> for (unsigned i = 0; i < BytesLeft; i++) {
>>>>>>> - unsigned scratch = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned srcOut = MRI.createVirtualRegister(TRC);
>>>>>>> - unsigned destOut = MRI.createVirtualRegister(TRC);
>>>>>>> - if (isThumb2) {
>>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
>>>>>>> - TII->get(ldrOpc),scratch)
>>>>>>> - .addReg(srcOut, RegState::Define).addReg(srcIn).addImm(1));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> TII->get(strOpc),
>>>>>> destOut)
>>>>>>> - .addReg(scratch).addReg(destIn)
>>>>>>> - .addImm(1));
>>>>>>> - } else {
>>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
>>>>>>> - TII->get(ldrOpc),scratch)
>>>>>>> - .addReg(srcOut,
>>>>>> RegState::Define).addReg(srcIn).addReg(0).addImm(1));
>>>>>>> -
>>>>>>> - AddDefaultPred(BuildMI(*BB, StartOfExit, dl,
> TII->get(strOpc),
>>>>>> destOut)
>>>>>>> - .addReg(scratch).addReg(destIn)
>>>>>>> - .addReg(0).addImm(1));
>>>>>>> - }
>>>>>>> - srcIn = srcOut;
>>>>>>> - destIn = destOut;
>>>>>>> + std::pair<unsigned, unsigned> res =
>>>>>>> + ByvalEmitter.emitByteLoad(BB, StartOfExit, dl, srcIn);
>>>>>>> + unsigned scratch = res.first;
>>>>>>> + srcIn = res.second;
>>>>>>> + destIn = ByvalEmitter.emitByteStore(BB, StartOfExit, dl,
>>>>>>> + destIn, scratch);
>>>>>>> }
>>>>>>>
>>>>>>> MI->eraseFromParent(); // The instruction is gone now.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>
>>>
>
>
More information about the llvm-commits
mailing list