[llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand for base update optimization
Jim Grosbach
grosbach at apple.com
Mon Sep 17 10:42:59 PDT 2012
Hi David,
Thanks for the update! Unfortunately, there are still some significant problems. Comments below.
> From 7a15dd0b727ddc7806624bf5d77f7178aa3dbb8d Mon Sep 17 00:00:00 2001
> From: David Peixotto <dpeixott at codeaurora.org>
> Date: Fri, 24 Aug 2012 18:10:36 -0700
> Subject: [PATCH 2/2] Convert VLDR/VSTR on demand for base update optimization
>
> This commit allows the base-address update optimization to work for VLDR and
> VSTR. There is no writeback version of VLDR/VSTR, so we cannot directly fold
> the update of the base address into the instruction itself. We must first
> transform the VLDR/VSTR to VLDM/VSTM and then use the writeback version of
> those instructions. For example we perform the following transformation,
>
> VLDR D0, [R0]
> ADD R0, R0, #8
> ==>
> VLDM R0!, {D0}
>
> The VLDR/VSTR instructions will only be changed to VLDM/VSTM if an update of
> the base address register can be folded into the instruction. Other VLDR/VSTR
> instructions are left undisturbed.
> ---
> lib/Target/ARM/ARMLoadStoreOptimizer.cpp | 163 +++++++++++++++++++++++++++++-
> test/CodeGen/ARM/vlsm_upd.ll | 59 +++++++++++
> 2 files changed, 219 insertions(+), 3 deletions(-)
>
> diff --git a/lib/Target/ARM/ARMLoadStoreOptimizer.cpp b/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> index d364b57..439e2c1 100644
> --- a/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> +++ b/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> @@ -74,6 +74,10 @@ EnableSPIncrementFolding("enable-lsm-sp-inc", cl::Hidden,
> "Enable sp increment folding in the lsm-with-auto-increment optimization"),
> cl::init(false));
>
> +static cl::opt<bool>
> +EnableLSMWriteBackConvert("enable-lsm-writeback-convert", cl::Hidden,
> + cl::desc("Enable conversion of vldr/vstr to vldm/vstm for writeback opt"),
Can you re-phrase this a bit? Not sure I have a good suggestion, but as is, it reads kinda awkwardly.
> + cl::init(true));
>
> /// ARMAllocLoadStoreOpt - Post- register allocation pass the combine
> /// load / store instructions to form ldm / stm instructions.
> @@ -1601,13 +1605,146 @@ static bool memOpHasWriteBackOption(const MachineInstr& MI) {
> return false;
> }
>
> +// Check to see if we can convert the instruction to a version that
> +// supports writeback to the base register.
> +// We can convert a VLDR/VSTR to VLDM/VSTM as long as the VLDR/VSTR do
> +// not have any pre-increment in the instruction (i.e. the imm in the
> +// instruction should be zero).
> +static bool canConvertToWriteBackInstr(const MachineInstr& MI) {
> + unsigned Opcode = MI.getOpcode();
> + switch (Opcode) {
> + default: return false;
> + case ARM::VLDRS:
> + case ARM::VSTRS:
> + case ARM::VLDRD:
> + case ARM::VSTRD:{
> + // Check to make sure we have an instruction of the form
> + // VLDR <Dd>, [<Rn> {, #+-<imm>]
> + if (MI.getNumOperands() < 3)
> + return false;
> + if (!MI.getOperand(1).isReg()) // Base address
> + return false;
What conditions are these statements trying to check for? If they ever fire, something has gone seriously wrong, as that would indicate a malformed instruction.
> + // Make sure the imm is zero
Period on the end.
> + const MachineOperand& PreIncOp = MI.getOperand(2);
'&' goes on the identifier, not the type.
> + return PreIncOp.isImm() && (PreIncOp.getImm() == 0);
Similar to the above, why are you checking isImm() here? If that operand isn't an immediate, something is very, very wrong.
> + }
> + }
A comment here that the switch statement won't fall through may help readability a bit.
> +}
> +
> +// Convert a VLDR/VSTR to VLDMIA/VSTMIA
Period at the end of the sentence.
> +// useful for converting an memory instruction to a variant that can
> +// update the base address as a side-effect of the instruction.
Please rephrase this so it's a complete sentence.
> +static MachineInstr* convertVLSRtoVLSMIA(MachineBasicBlock &MBB,
> + MachineInstr *MI,
> + const TargetInstrInfo *TII) {
> + unsigned Opcode = MI->getOpcode();
> + unsigned NewOpc;
> + bool isLoad = false;
> + switch (Opcode) {
> + default: llvm_unreachable("Unhandled opcode");
> + case ARM::VLDRS:
> + isLoad = true;
> + NewOpc = ARM::VLDMSIA;
> + break;
> + case ARM::VSTRS:
> + NewOpc = ARM::VSTMSIA;
> + break;
> + case ARM::VLDRD:
> + isLoad = true;
> + NewOpc = ARM::VLDMDIA;
> + break;
> + case ARM::VSTRD:
> + NewOpc = ARM::VSTMDIA;
> + break;
> + }
You want the updating forms of these instructions, right? These are the non-updating forms. You want the _UPD suffixed instructions, almost certainly.
For example,
enkidu: ~/tmp $ cat x.s
_foo:
vldmia r1!, {d2}
enkidu: ~/tmp $ llvm-mc -triple=armv7-apple-darwin < x.s -show-inst
.section __TEXT,__text,regular,pure_instructions
_foo:
vldmia r1!, {d2} @ <MCInst #1201 VLDMDIA_UPD
@ <MCOperand Reg:61>
@ <MCOperand Reg:61>
@ <MCOperand Imm:14>
@ <MCOperand Reg:0>
@ <MCOperand Reg:5>>
> + MachineBasicBlock::iterator MBBI(MI);
> + MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI->getDebugLoc(),
> + TII->get(NewOpc));
> +
> + assert(MI->getOperand(2).isImm() && MI->getOperand(2).getImm() == 0 &&
> + "VLDR has non-zero pre-increment value");
As above, no need to check isImm() here. Also, "VLDR/VSTR".
> + MachineOperand& VPRReg = MI->getOperand(0);
> + if (isLoad)
> + VPRReg.setIsDef(true);
> + MIB.addOperand(MI->getOperand(1)); // Base address
> + MIB.addOperand(MI->getOperand(3)); // Predicate Imm
> + MIB.addOperand(MI->getOperand(4)); // Predicate Register
> + MIB.addOperand(VPRReg); // VPR register
This isn't right. See above.
Have a very close look in the .td files for how these instructions, and the VLDR/VSTR instruction operands are defined. You can also look at the generated code in the .inc files (InstrInfo and AsmMatcher are useful, in my experience). I also find it handy to verify my understanding by writing the instructions in a .s file and using llvm-mc to print out the MCOperands for it (which mirror the MachineOperands in type and order).
> +
> + // Transfer any extra operands (e.g. implicit defs/kills)
> + for (unsigned int i = 5; i < MI->getNumOperands(); ++i)
> + MIB.addOperand(MI->getOperand(i));
> +
What cases have you encountered where a VLDR/VSTR instruction has any additional operands that need to be copied?
> + // Transfer memoperands.
> + MIB->setMemRefs(MI->memoperands_begin(), MI->memoperands_end());
> +
> + // Remove the VLDR instruction
> + MBB.erase(MBBI);
> + return MIB;
> +}
> +
> +// Convert a VLDMIA/VSTMIA to VLDR/VSTR
> +// useful for undoing the conversion we performed in the opposite direction so
> +// that we could potentially merge the update of the base address into the
> +// instruction.
Ditto of above for this whole function.
> +static MachineInstr* convertVLSMIAtoVLSR(MachineBasicBlock &MBB,
> + MachineInstr *MI,
> + const TargetInstrInfo *TII) {
> + unsigned Opcode = MI->getOpcode();
> + unsigned NewOpc;
> + bool isLoad = false;
> + switch (Opcode) {
> + default: llvm_unreachable("Unhandled opcode");
> + case ARM::VLDMSIA:
> + isLoad = true;
> + NewOpc = ARM::VLDRS;
> + break;
> + case ARM::VSTMSIA:
> + NewOpc = ARM::VSTRS;
> + break;
> + case ARM::VLDMDIA:
> + isLoad = true;
> + NewOpc = ARM::VLDRD;
> + break;
> + case ARM::VSTMDIA:
> + NewOpc = ARM::VSTRD;
> + break;
> + }
> + MachineBasicBlock::iterator MBBI(MI);
> + MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI->getDebugLoc(),
> + TII->get(NewOpc));
> +
> + // There should be only one VPR register since this instruction was
> + // originally converted from a VLDR/VSTR
> + MachineOperand& VPRReg = MI->getOperand(3);
> + if (isLoad)
> + VPRReg.setIsDef(true);
> + MIB.addOperand(VPRReg); // VPR register
> + MIB.addOperand(MI->getOperand(0)); // Base address
> + MIB.addImm(0); // Offset
> + MIB.addOperand(MI->getOperand(1)); // Predicate Imm
> + MIB.addOperand(MI->getOperand(2)); // Predicate Register
> +
> + // Transfer any extra operands (e.g. implicit defs/kills)
> + for (unsigned int i = 4; i < MI->getNumOperands(); ++i)
> + MIB.addOperand(MI->getOperand(i));
> +
> + // Transfer memoperands.
> + MIB->setMemRefs(MI->memoperands_begin(), MI->memoperands_end());
> +
> + // Remove the VLDM instruction
> + MBB.erase(MBBI);
> + return MIB;
> +}
> +
> /// LSMWriteBackOptimzation- An optimization pass to turn LDM / STM
> /// ops followed/preceeded by an increment/decrement of the base address to the
> /// writeback version of the opcode.
This isn't a pass. It's part of the bigger load-store-optimization pass. The comment should be reworded to be more clear and to not be a sentence fragment.
> bool ARMLoadStoreOpt::LSMWriteBackOptimization(MachineBasicBlock &MBB,
> const MISet &NewLSMInstrs) {
> - typedef SmallVector<MachineInstr*,8> Worklist;
> + typedef SmallVector<std::pair<MachineInstr*,bool>,8> Worklist;
> int NumMerges = 0;
> + const bool NeedsConversion = true; // flag for when we need to convert
What's the purpose of this variable?
>
> // Find all the load/store multiple instructions where we can potentially
> // merge an update of the base address into the instruction
Here and everywhere following, please update to use complete sentences including punctuation.
> @@ -1627,17 +1764,37 @@ bool ARMLoadStoreOpt::LSMWriteBackOptimization(MachineBasicBlock &MBB,
> continue;
>
> if(memOpHasWriteBackOption(*MBBI)) {
> - MemOps.push_back(MBBI);
> + MemOps.push_back(std::make_pair(MBBI, !NeedsConversion));
> + }
> + else if (EnableLSMWriteBackConvert && canConvertToWriteBackInstr(*MBBI)) {
> + MemOps.push_back(std::make_pair(MBBI, NeedsConversion));
There is no memOpHasWriteBackOption() function in top of tree, nor has there been that I can find. Is this patch complete? Using the memory operands for that check is not the right way to go. Reference the instructions themselves.
> }
> }
>
> // Work the list to try and merge base update into the load/store operation
> for (Worklist::iterator WI = MemOps.begin(), WE = MemOps.end();
> WI != WE; ++WI) {
> - MachineInstr *MI = *WI;
> + MachineInstr *MI = WI->first;
> + bool NeedsConversion = WI->second;
> DEBUG(dbgs() << "Trying to merge increment for " << *MI);
> +
> + // Change vldr/vstr to vldmia/vstmia to try and merge increment
> + if (NeedsConversion) {
> + DEBUG(dbgs() << " first converting to op with writeback mode\n");
> + MI = convertVLSRtoVLSMIA(MBB, MI, TII);
> + DEBUG(dbgs() << " converted to: " << *MI);
> + }
> +
> + // Try and merge base update into the load/store multiple instruction
> if(MergeBaseUpdateLSMultiple(MBB, MI))
> ++NumMerges;
> +
> + // Undo change of vldr/vstr to vldmia/vstmia if merge failed
> + else if (NeedsConversion) {
> + DEBUG(dbgs() << " merge failed, unconverting to vldr/vstr\n");
> + MI = convertVLSMIAtoVLSR(MBB, MI, TII);
> + DEBUG(dbgs() << " unconverted to: " << *MI);
> + }
> }
>
> NumIncMerges += NumMerges; // update the statistic
> diff --git a/test/CodeGen/ARM/vlsm_upd.ll b/test/CodeGen/ARM/vlsm_upd.ll
> index 4e28dce..c24d439 100644
> --- a/test/CodeGen/ARM/vlsm_upd.ll
> +++ b/test/CodeGen/ARM/vlsm_upd.ll
> @@ -1,5 +1,7 @@
> ; RUN: llc < %s -march=arm -mattr=+neon\
> ; RUN: -enable-lsm-with-auto-increment=true\
> +; RUN: -enable-lsm-writeback-convert=true\
> +; RUN: -verify-machineinstrs\
> ; RUN: | FileCheck %s
> ; Test cases for the enable-lsm-with-auto-increment optimization that folds the
> ; increment of the base address into the load/store multiple instruction
> @@ -24,5 +26,62 @@ define void @addVec_4x4() nounwind {
> ret void
> }
>
> + at B2x4 = common global [100 x <2 x float>] zeroinitializer, align 16
> + at A2x4 = common global [100 x <2 x float>] zeroinitializer, align 16
> +declare void @v2x4(<2 x float>*, <2 x float>*)
> +
> +;CHECK: addVec_2x4:
> +define void @addVec_2x4() nounwind {
> + %bi = getelementptr inbounds [100 x <2 x float>]* @B2x4, i32 0, i32 0
> +;CHECK: vldmia r1!
> + %v1 = load <2 x float>* %bi, align 16, !tbaa !0
> + %ai = getelementptr inbounds [100 x <2 x float>]* @A2x4, i32 0, i32 0
> + %v2 = load <2 x float>* %ai, align 16, !tbaa !0
> + %v3 = fadd <2 x float> %v1, %v2
> +;CHECK: vstmia r0!
> + store <2 x float> %v3, <2 x float>* %ai, align 16, !tbaa !0
> + %bi_1 = getelementptr [100 x <2 x float>]* @B2x4, i32 0, i32 1
> + %ai_1 = getelementptr [100 x <2 x float>]* @A2x4, i32 0, i32 1
> + call void @v2x4(<2 x float>* %ai_1, <2 x float>* %bi_1);
> + ret void
> +}
> +
> + at B1x4 = common global [100 x <1 x float>] zeroinitializer, align 16
> + at A1x4 = common global [100 x <1 x float>] zeroinitializer, align 16
> +declare void @v1x4(<1 x float>*, <1 x float>*)
> +
> +;CHECK: addVec_1x4:
> +define void @addVec_1x4() nounwind {
> + %bi = getelementptr inbounds [100 x <1 x float>]* @B1x4, i32 0, i32 0
> +;CHECK: vldmia r1!
> + %v1 = load <1 x float>* %bi, align 16, !tbaa !0
> + %ai = getelementptr inbounds [100 x <1 x float>]* @A1x4, i32 0, i32 0
> + %v2 = load <1 x float>* %ai, align 16, !tbaa !0
> + %v3 = fadd <1 x float> %v1, %v2
> +;CHECK: vstmia r0!
> + store <1 x float> %v3, <1 x float>* %ai, align 16, !tbaa !0
> + %bi_1 = getelementptr [100 x <1 x float>]* @B1x4, i32 0, i32 1
> + %ai_1 = getelementptr [100 x <1 x float>]* @A1x4, i32 0, i32 1
> + call void @v1x4(<1 x float>* %ai_1, <1 x float>* %bi_1);
> + ret void
> +}
> +
> +; Make sure we correctly output the vldr/vstr instructions if we
> +; do not convert them to vldm/vstm for autoincrement optimization
> +;CHECK: addVec_1x4_roundtrip:
> +define void @addVec_1x4_roundtrip() nounwind {
> + %bi = getelementptr inbounds [100 x <1 x float>]* @B1x4, i32 0, i32 0
> +;CHECK: vldr
> +;CHECK-NOT: vldmia r1!
> + %v1 = load <1 x float>* %bi, align 16, !tbaa !0
> + %ai = getelementptr inbounds [100 x <1 x float>]* @A1x4, i32 0, i32 0
> + %v2 = load <1 x float>* %ai, align 16, !tbaa !0
> + %v3 = fadd <1 x float> %v1, %v2
> +;CHECK: vstr
> +;CHECK-NOT: vstmia r0!
> + store <1 x float> %v3, <1 x float>* %ai, align 16, !tbaa !0
> + ret void
> +}
> +
> !0 = metadata !{metadata !"omnipotent char", metadata !1}
> !1 = metadata !{metadata !"Simple C/C++ TBAA"}
> --
> 1.7.8.3
>
On Sep 8, 2012, at 2:19 PM, David Peixotto <dpeixott at codeaurora.org> wrote:
> Ok, I've addressed the comments from the review. I've attached a new patch
> to this email.
>
> I also fixed all the issues found by -verify-machineinstrs. Those fixes
> belonged the in other patch in this patch set (see [PATCH 1/2] Improve
> optimization to fold base update into LDM/STM), and the patch in this email
> already depends on that other patch
>
> I've also uploaded a new version of the other patch. Once it is reviewed and
> approved both patches are ready to be applied.
>
> Thanks,
> Dave
>
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of David Peixotto
>> Sent: Thursday, September 06, 2012 6:44 PM
>> To: 'Jim Grosbach'
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand for
>> base update optimization
>>
>> Thanks for the feedback Jim!
>>
>> Thanks for the tip about -verify-machineinstrs. It found a few things to
>> complain about and I clean those up. I'll also apply your suggestions
> below
>> and submit a new patch. Thanks!
>>
>> -Dave
>>
>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>>
>> -----Original Message-----
>> From: Jim Grosbach [mailto:grosbach at apple.com]
>> Sent: Wednesday, September 05, 2012 5:21 PM
>> To: David Peixotto
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand for
>> base update optimization
>>
>> Nice!
>>
>> Style note:
>> +static MachineInstr* convertVLSRtoVLSMIA(MachineBasicBlock &MBB,
>> + MachineInstr *MI,
>> + const TargetInstrInfo *TII) {
>>
>> The '{' should go on the preceding line, not on a new line. Similarly for
> the
>> other functions.
>>
>>
>> In convertVLSRtoVLSMIA:
>>
>> + MachineBasicBlock::iterator MBBI(MI); MachineInstrBuilder MIB =
>> + BuildMI(MBB, MBBI, MI->getDebugLoc(),
>> + TII->get(NewOpc));
>> +
>> + assert(MI->getOperand(2).isImm() && MI->getOperand(2).getImm() == 0
>> &&
>> + "VLDR has non-zero pre-increment value");
>> + MIB.addOperand(MI->getOperand(1)); // Base address
>> + MIB.addOperand(MI->getOperand(3)); // Predicate Imm
>> + MIB.addOperand(MI->getOperand(4)); // Predicate Register
>> + MIB.addOperand(MI->getOperand(0)); // VPR register
>>
>> The VPR register should probably be marked as a def for loads.
>>
>> In convertVLSMIAtoVLSR:
>>
>> + MachineBasicBlock::iterator MBBI(MI); MachineInstrBuilder MIB =
>> + BuildMI(MBB, MBBI, MI->getDebugLoc(),
>> + TII->get(NewOpc));
>> +
>> + // There should be only one VPR register since this instruction was
>> + // originally converted from a VLDR/VSTR
>> + MIB.addOperand(MI->getOperand(3)); // VPR register
>> + MIB.addOperand(MI->getOperand(0)); // Base address
>> + MIB.addImm(0); // Offset
>> + MIB.addOperand(MI->getOperand(1)); // Predicate Imm
>> + MIB.addOperand(MI->getOperand(2)); // Predicate Register
>> +
>>
>> The VPR register should definitely be marked as a def for loads.
>>
>> If you haven't already, make sure to run tests with the machine verifier
>> enabled after the pass runs. It'l help catch lots of things.
>>
>>
>> LGTM with those updates.
>>
>> -Jim
>>
>>
>> On Aug 31, 2012, at 11:02 AM, David Peixotto <dpeixott at codeaurora.org>
>> wrote:
>>
>>> Please review the attached patch. Thanks!
>>>
>>> This commit allows the base-address update optimization to work for
>>> VLDR and VSTR. There is no writeback version of VLDR/VSTR, so we
>>> cannot directly fold the update of the base address into the
>>> instruction itself. We must first transform the VLDR/VSTR to VLDM/VSTM
>>> and then use the writeback version of those instructions. For example
>>> we perform the following transformation,
>>>
>>> VLDR D0, [R0]
>>> ADD R0, R0, #8
>>> ==>
>>> VLDM R0!, {D0}
>>>
>>> The VLDR/VSTR instructions will only be changed to VLDM/VSTM if an
>>> update of the base address register can be folded into the
>>> instruction. Other VLDR/VSTR instructions are left undisturbed.
>>>
>>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> hosted by The Linux Foundation
>>>
>>>
>>>
>>> <0002-Convert-VLDR-VSTR-on-demand-for-base-update-
>> optimiza.patch>_____
>>> __________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> <0002-Convert-VLDR-VSTR-on-demand-for-base-update-optimiza.patch>
More information about the llvm-commits
mailing list