[llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand for base update optimization

Tue Sep 18 10:30:02 PDT 2012

Hi Jim,

Thanks for the further review. I think that I can address a number of your
concerns with a top level comment.

This patch is the second of two patches I submitted to improve the
transformation for folding an increment of the base pointer into a ldm/stm
instruction. The first patch was sent to the llvm-commmits mailing list with
the subject: " [PATCH 1/2] Improve optimization to fold base update into
LDM/STM". It still needs to be reviewed.

This patch depends on the first patch, so it would not be able to be applied
independently of the first patch (as you noticed it uses some of the
functions from the first patch, such as the memOpHasWriteBackOption()
functions).

I separated the changes into two patches to try and make the changes more
readable and self-contained. I also wasn't sure that people would like the
transformation in this patch that changes VLDR --> VLDMIA when a
post-increment is possible. I kept them separate so the first patch could be
accepted even if people did not like the second patch.  I wasn't quite sure
how to submit two related patches and I ended up sending them in separate
emails. Probably I should have kept them together. Sorry for the confusion.

The first patch extracts the base pointer increment folding transformation
for LDM/STM into a self-contained function
(ARMLoadStoreOpt::MergeBaseUpdateLSMultiple), and this patch makes use of
that function. Previously the MergeBaseUpdateLSMultiple() function was run
interleaved with other load/store transformations.

The main idea of this patch is to reuse the MergeBaseUpdateLSMultiple()
function to fold the increment for VLDR and VSTR instructions. The
MergeBaseUpdateLSMultiple() takes a load/store instruction that can be
converted to a version that uses a post-increment and does the conversion if
an opportunity exists. Because VLDR/VSTR do not have a post-increment
version I had two ways to handle these instructions

    1. Unconditionally convert VLDR/VSTR to VLDMIA/VSTMIA and try the base
increment transformation. If the transformation opportunity does not exist,
revert the instruction to the original VLDR/VSTR.
    2. Modify the MergeBaseUpdateLSMultiple to handle VLDR/VSTR as a special
case.

I chose option 1 because it looked like a cleaner implementation. Also, the
MergeBaseUpdateLSMultiple() uses existing functions that do not expect
VLDR/VSTR so I was able to keep using those functions without issue.

Hopefully that clears up some confusion. I'll respond to your specific
comments below. I can submit a new patch with some of your suggestions, but
please read through my comments above and see if that helps the patch make
more sense as it stands.

Thanks,
Dave

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation

> -----Original Message-----
> From: Jim Grosbach [mailto:grosbach at apple.com]
> Sent: Monday, September 17, 2012 10:43 AM
> To: David Peixotto
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand for
> base update optimization
> 
> Hi David,
> 
> Thanks for the update! Unfortunately, there are still some significant
> problems. Comments below.
> 
> > From 7a15dd0b727ddc7806624bf5d77f7178aa3dbb8d Mon Sep 17 00:00:00
> 2001
> > From: David Peixotto <dpeixott at codeaurora.org>
> > Date: Fri, 24 Aug 2012 18:10:36 -0700
> > Subject: [PATCH 2/2] Convert VLDR/VSTR on demand for base update
> > optimization
> >
> > This commit allows the base-address update optimization to work for
> > VLDR and VSTR. There is no writeback version of VLDR/VSTR, so we
> > cannot directly fold the update of the base address into the
> > instruction itself. We must first transform the VLDR/VSTR to VLDM/VSTM
> > and then use the writeback version of those instructions. For example
> > we perform the following transformation,
> >
> > VLDR D0, [R0]
> > ADD  R0, R0, #8
> > ==>
> > VLDM R0!, {D0}
> >
> > The VLDR/VSTR instructions will only be changed to VLDM/VSTM if an
> > update of the base address register can be folded into the
> > instruction. Other VLDR/VSTR instructions are left undisturbed.
> > ---
> >  lib/Target/ARM/ARMLoadStoreOptimizer.cpp |  163
> +++++++++++++++++++++++++++++-
> >  test/CodeGen/ARM/vlsm_upd.ll             |   59 +++++++++++
> >  2 files changed, 219 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> > b/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> > index d364b57..439e2c1 100644
> > --- a/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> > +++ b/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
> > @@ -74,6 +74,10 @@ EnableSPIncrementFolding("enable-lsm-sp-inc",
> cl::Hidden,
> >     "Enable sp increment folding in the lsm-with-auto-increment
> optimization"),
> >   cl::init(false));
> >
> > +static cl::opt<bool>
> > +EnableLSMWriteBackConvert("enable-lsm-writeback-convert",
> cl::Hidden,
> > +  cl::desc("Enable conversion of vldr/vstr to vldm/vstm for writeback
> > +opt"),
> 
> Can you re-phrase this a bit? Not sure I have a good suggestion, but as
is, it
> reads kinda awkwardly.

Sure I'll give it another try. The main problem is that it doesn't lend
itself to a nice compact description, but I'll have another go :)

> > +  cl::init(true));
> >
> >  /// ARMAllocLoadStoreOpt - Post- register allocation pass the combine
> > /// load / store instructions to form ldm / stm instructions.
> > @@ -1601,13 +1605,146 @@ static bool memOpHasWriteBackOption(const
> MachineInstr& MI) {
> >    return false;
> >  }
> >
> > +// Check to see if we can convert the instruction to a version that
> > +// supports writeback to the base register.
> > +// We can convert a VLDR/VSTR to VLDM/VSTM as long as the VLDR/VSTR
> > +do // not have any pre-increment in the instruction (i.e. the imm in
> > +the // instruction should be zero).
> > +static bool canConvertToWriteBackInstr(const MachineInstr& MI) {
> > +  unsigned Opcode = MI.getOpcode();
> > +  switch (Opcode) {
> > +  default: return false;
> > +  case ARM::VLDRS:
> > +  case ARM::VSTRS:
> > +  case ARM::VLDRD:
> > +  case ARM::VSTRD:{
> > +    // Check to make sure we have an instruction of the form
> > +    // VLDR <Dd>, [<Rn> {, #+-<imm>]
> > +    if (MI.getNumOperands() < 3)
> > +      return false;
> > +    if (!MI.getOperand(1).isReg()) // Base address
> > +      return false;
> 
> What conditions are these statements trying to check for? If they ever
fire,
> something has gone seriously wrong, as that would indicate a malformed
> instruction.

This was just an attempt at defensive programming. I agree that they would
indicate a malformed instruction. I'll remove the checks.

> 
> > +    // Make sure the imm is zero
> Period on the end.
> > +    const MachineOperand& PreIncOp = MI.getOperand(2);
> 
> '&' goes on the identifier, not the type.
> 
> > +    return PreIncOp.isImm() && (PreIncOp.getImm() == 0);
> 
> Similar to the above, why are you checking isImm() here? If that operand
> isn't an immediate, something is very, very wrong.

Agreed. Will remove the check.

> 
> > +    }
> > +  }
> 
> A comment here that the switch statement won't fall through may help
> readability a bit.

Will do.

> 
> > +}
> > +
> > +// Convert a VLDR/VSTR to VLDMIA/VSTMIA
> Period at the end of the sentence.

Ok.

> > +// useful for converting an memory instruction to a variant that can
> > +// update the base address as a side-effect of the instruction.
> Please rephrase this so it's a complete sentence.

Ok.

> > +static MachineInstr* convertVLSRtoVLSMIA(MachineBasicBlock &MBB,
> > +                                         MachineInstr *MI,
> > +                                         const TargetInstrInfo *TII)
> > +{
> > +  unsigned Opcode = MI->getOpcode();
> > +  unsigned NewOpc;
> > +  bool isLoad = false;
> > +  switch (Opcode) {
> > +  default: llvm_unreachable("Unhandled opcode");
> > +  case ARM::VLDRS:
> > +    isLoad = true;
> > +    NewOpc = ARM::VLDMSIA;
> > +    break;
> > +  case ARM::VSTRS:
> > +    NewOpc = ARM::VSTMSIA;
> > +    break;
> > +  case ARM::VLDRD:
> > +    isLoad = true;
> > +    NewOpc = ARM::VLDMDIA;
> > +    break;
> > +  case ARM::VSTRD:
> > +    NewOpc = ARM::VSTMDIA;
> > +    break;
> > +  }
> 
> You want the updating forms of these instructions, right? These are the
non-
> updating forms. You want the _UPD suffixed instructions, almost certainly.

I'm intentionally selecting the non _UPD version here. See the discussion at
the top of the email. Here I'm converting the VLDR/VSTR -> VLDMIA/VSTMIA and
then passing the VLDMIA/VSTMIA to the base-increment transformation. If the
transformation is not possible I will un-convert VLDMIA/VSTMIA back to the
original VLDR/VSTR.

> For example,
> enkidu: ~/tmp $ cat x.s
> _foo:
> 
>   vldmia r1!, {d2}
> enkidu: ~/tmp $ llvm-mc -triple=armv7-apple-darwin < x.s -show-inst
> 	.section	__TEXT,__text,regular,pure_instructions
> _foo:
> 
> 	vldmia	r1!, {d2}               @ <MCInst #1201 VLDMDIA_UPD
>                                         @  <MCOperand Reg:61>
>                                         @  <MCOperand Reg:61>
>                                         @  <MCOperand Imm:14>
>                                         @  <MCOperand Reg:0>
>                                         @  <MCOperand Reg:5>>
> 
> 
> 
> > +  MachineBasicBlock::iterator MBBI(MI);  MachineInstrBuilder MIB =
> > + BuildMI(MBB, MBBI, MI->getDebugLoc(),
> > +                                    TII->get(NewOpc));
> > +
> > +  assert(MI->getOperand(2).isImm() && MI->getOperand(2).getImm() ==
> 0 &&
> > +         "VLDR has non-zero pre-increment value");
> 
> As above, no need to check isImm() here. Also, "VLDR/VSTR".

Will remove that check and update the assertion string.

> 
> > +  MachineOperand& VPRReg = MI->getOperand(0);  if (isLoad)
> > +    VPRReg.setIsDef(true);
> > +  MIB.addOperand(MI->getOperand(1)); // Base address
> > + MIB.addOperand(MI->getOperand(3)); // Predicate Imm
> > + MIB.addOperand(MI->getOperand(4)); // Predicate Register
> > +  MIB.addOperand(VPRReg);            // VPR register
> 
> This isn't right. See above.

Ok, I'll take a look to make sure I'm correctly forming the VLDMIA/VSTMIA
and VLDR/VSTR instructions.

> Have a very close look in the .td files for how these instructions, and
the
> VLDR/VSTR instruction operands are defined. You can also look at the
> generated code in the .inc files (InstrInfo and AsmMatcher are useful, in
my
> experience). I also find it handy to verify my understanding by writing
the
> instructions in a .s file and using llvm-mc to print out the MCOperands
for it
> (which mirror the MachineOperands in type and order).

Oh, that's a nice tip!

> > +
> > +  // Transfer any extra operands (e.g. implicit defs/kills)  for
> > + (unsigned int i = 5; i < MI->getNumOperands(); ++i)
> > +    MIB.addOperand(MI->getOperand(i));
> > +
> 
> What cases have you encountered where a VLDR/VSTR instruction has any
> additional operands that need to be copied?

I had taken this code from elsewhere in the file. Perhaps it is not
necessary. The one time I can think of it having additional operations would
be an implicit def of a D register when loading an S register. I'm not sure
if those are modeled like that, so I'm happy to remove it if it looks
unnecessary to you.

> > +  // Transfer memoperands.
> > +  MIB->setMemRefs(MI->memoperands_begin(), MI-
> >memoperands_end());
> > +
> > +  // Remove the VLDR instruction
> > +  MBB.erase(MBBI);
> > +  return MIB;
> > +}
> > +
> > +// Convert a VLDMIA/VSTMIA to VLDR/VSTR // useful for undoing the
> > +conversion we performed in the opposite direction so // that we could
> > +potentially merge the update of the base address into the //
> > +instruction.
> 
> Ditto of above for this whole function.

I'll look through and make sure I hit the points you mentioned, but it was
the intention to convert VLDMIA/VSTMIA -> VLDR/VSTR here. This happens when
the MergeBaseUpdateLSMultiple() function is unable to transform the
VLDMIA/VSTMIA to VLDMIA_UPD/VSTMIA_UPD.

> 
> > +static MachineInstr* convertVLSMIAtoVLSR(MachineBasicBlock &MBB,
> > +                                         MachineInstr *MI,
> > +                                         const TargetInstrInfo *TII)
> > +{
> > +  unsigned Opcode = MI->getOpcode();
> > +  unsigned NewOpc;
> > +  bool isLoad = false;
> > +  switch (Opcode) {
> > +  default: llvm_unreachable("Unhandled opcode");
> > +  case ARM::VLDMSIA:
> > +    isLoad = true;
> > +    NewOpc = ARM::VLDRS;
> > +    break;
> > +  case ARM::VSTMSIA:
> > +    NewOpc = ARM::VSTRS;
> > +    break;
> > +  case ARM::VLDMDIA:
> > +    isLoad = true;
> > +    NewOpc = ARM::VLDRD;
> > +    break;
> > +  case ARM::VSTMDIA:
> > +    NewOpc = ARM::VSTRD;
> > +    break;
> > +  }
> > +  MachineBasicBlock::iterator MBBI(MI);
> > +  MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI->getDebugLoc(),
> > +                                    TII->get(NewOpc));
> > +
> > +  // There should be only one VPR register since this instruction was
> > + // originally converted from a VLDR/VSTR  MachineOperand& VPRReg =
> > + MI->getOperand(3);  if (isLoad)
> > +    VPRReg.setIsDef(true);
> > +  MIB.addOperand(VPRReg);            // VPR register
> > +  MIB.addOperand(MI->getOperand(0)); // Base address
> > +  MIB.addImm(0);                     // Offset
> > +  MIB.addOperand(MI->getOperand(1)); // Predicate Imm
> > + MIB.addOperand(MI->getOperand(2)); // Predicate Register
> > +
> > +  // Transfer any extra operands (e.g. implicit defs/kills)  for
> > + (unsigned int i = 4; i < MI->getNumOperands(); ++i)
> > +    MIB.addOperand(MI->getOperand(i));
> > +
> > +  // Transfer memoperands.
> > +  MIB->setMemRefs(MI->memoperands_begin(), MI-
> >memoperands_end());
> > +
> > +  // Remove the VLDM instruction
> > +  MBB.erase(MBBI);
> > +  return MIB;
> > +}
> > +
> >  /// LSMWriteBackOptimzation- An optimization pass to turn LDM / STM
> > /// ops followed/preceeded by an increment/decrement of the base
> > address to the  /// writeback version of the opcode.
> 
> This isn't a pass. It's part of the bigger load-store-optimization pass.
The
> comment should be reworded to be more clear and to not be a sentence
> fragment.

Ok.

> >  bool ARMLoadStoreOpt::LSMWriteBackOptimization(MachineBasicBlock
> &MBB,
> >                                                 const MISet
> > &NewLSMInstrs) {
> > -  typedef SmallVector<MachineInstr*,8> Worklist;
> > +  typedef SmallVector<std::pair<MachineInstr*,bool>,8> Worklist;
> >    int NumMerges = 0;
> > +  const bool NeedsConversion = true; // flag for when we need to
> > + convert
> What's the purpose of this variable?
> 
> >
> >    // Find all the load/store multiple instructions where we can
potentially
> >    // merge an update of the base address into the instruction
> 
> Here and everywhere following, please update to use complete sentences
> including punctuation.

Ok.

> > @@ -1627,17 +1764,37 @@ bool
> ARMLoadStoreOpt::LSMWriteBackOptimization(MachineBasicBlock &MBB,
> >        continue;
> >
> >      if(memOpHasWriteBackOption(*MBBI)) {
> > -      MemOps.push_back(MBBI);
> > +      MemOps.push_back(std::make_pair(MBBI, !NeedsConversion));
> > +    }
> > +    else if (EnableLSMWriteBackConvert &&
> canConvertToWriteBackInstr(*MBBI)) {
> > +      MemOps.push_back(std::make_pair(MBBI, NeedsConversion));
> 
> There is no memOpHasWriteBackOption() function in top of tree, nor has
> there been that I can find. Is this patch complete? Using the memory
> operands for that check is not the right way to go. Reference the
instructions
> themselves.

The memOpHasWriteBackOption() function is in the first patch of this series.
It just looks at the opcode and returns true if there is an _UPD version of
the instruction.

> >      }
> >    }
> >
> >    // Work the list to try and merge base update into the load/store
> operation
> >    for (Worklist::iterator WI = MemOps.begin(), WE = MemOps.end();
> >          WI != WE; ++WI) {
> > -    MachineInstr *MI = *WI;
> > +    MachineInstr *MI     = WI->first;
> > +    bool NeedsConversion = WI->second;
> >      DEBUG(dbgs() << "Trying to merge increment for " << *MI);
> > +
> > +    // Change vldr/vstr to vldmia/vstmia to try and merge increment
> > +    if (NeedsConversion) {
> > +      DEBUG(dbgs() << "  first converting to op with writeback
mode\n");
> > +      MI = convertVLSRtoVLSMIA(MBB, MI, TII);
> > +      DEBUG(dbgs() << "  converted to: " << *MI);
> > +    }
> > +
> > +    // Try and merge base update into the load/store multiple
> > + instruction
> >      if(MergeBaseUpdateLSMultiple(MBB, MI))
> >        ++NumMerges;
> > +
> > +    // Undo change of vldr/vstr to vldmia/vstmia if merge failed
> > +    else if (NeedsConversion) {
> > +      DEBUG(dbgs() << "  merge failed, unconverting to vldr/vstr\n");
> > +      MI = convertVLSMIAtoVLSR(MBB, MI, TII);
> > +      DEBUG(dbgs() << "  unconverted to: " << *MI);
> > +    }
> >    }
> >
> >    NumIncMerges += NumMerges; // update the statistic diff --git
> > a/test/CodeGen/ARM/vlsm_upd.ll b/test/CodeGen/ARM/vlsm_upd.ll
> index
> > 4e28dce..c24d439 100644
> > --- a/test/CodeGen/ARM/vlsm_upd.ll
> > +++ b/test/CodeGen/ARM/vlsm_upd.ll
> > @@ -1,5 +1,7 @@
> >  ; RUN: llc < %s -march=arm -mattr=+neon\
> >  ; RUN:          -enable-lsm-with-auto-increment=true\
> > +; RUN:          -enable-lsm-writeback-convert=true\
> > +; RUN:          -verify-machineinstrs\
> >  ; RUN:     | FileCheck %s
> >  ; Test cases for the enable-lsm-with-auto-increment optimization that
> > folds the  ; increment of the base address into the load/store
> > multiple instruction @@ -24,5 +26,62 @@ define void @addVec_4x4()
> nounwind {
> >    ret void
> >  }
> >
> > + at B2x4 = common global [100 x <2 x float>] zeroinitializer, align 16
> > + at A2x4 = common global [100 x <2 x float>] zeroinitializer, align 16
> > +declare void @v2x4(<2 x float>*, <2 x float>*)
> > +
> > +;CHECK: addVec_2x4:
> > +define void @addVec_2x4() nounwind {
> > +  %bi = getelementptr inbounds [100 x <2 x float>]* @B2x4, i32 0, i32
> > +0
> > +;CHECK: vldmia r1!
> > +  %v1 = load <2 x float>* %bi, align 16, !tbaa !0
> > +  %ai = getelementptr inbounds [100 x <2 x float>]* @A2x4, i32 0, i32
> > +0
> > +  %v2 = load <2 x float>* %ai, align 16, !tbaa !0
> > +  %v3 = fadd <2 x float> %v1, %v2
> > +;CHECK: vstmia r0!
> > +  store <2 x float> %v3, <2 x float>* %ai, align 16, !tbaa !0
> > +  %bi_1 = getelementptr [100 x <2 x float>]* @B2x4, i32 0, i32 1
> > +  %ai_1 = getelementptr [100 x <2 x float>]* @A2x4, i32 0, i32 1
> > +  call void @v2x4(<2 x float>* %ai_1, <2 x float>* %bi_1);
> > +  ret void
> > +}
> > +
> > + at B1x4 = common global [100 x <1 x float>] zeroinitializer, align 16
> > + at A1x4 = common global [100 x <1 x float>] zeroinitializer, align 16
> > +declare void @v1x4(<1 x float>*, <1 x float>*)
> > +
> > +;CHECK: addVec_1x4:
> > +define void @addVec_1x4() nounwind {
> > +  %bi = getelementptr inbounds [100 x <1 x float>]* @B1x4, i32 0, i32
> > +0
> > +;CHECK: vldmia r1!
> > +  %v1 = load <1 x float>* %bi, align 16, !tbaa !0
> > +  %ai = getelementptr inbounds [100 x <1 x float>]* @A1x4, i32 0, i32
> > +0
> > +  %v2 = load <1 x float>* %ai, align 16, !tbaa !0
> > +  %v3 = fadd <1 x float> %v1, %v2
> > +;CHECK: vstmia r0!
> > +  store <1 x float> %v3, <1 x float>* %ai, align 16, !tbaa !0
> > +  %bi_1 = getelementptr [100 x <1 x float>]* @B1x4, i32 0, i32 1
> > +  %ai_1 = getelementptr [100 x <1 x float>]* @A1x4, i32 0, i32 1
> > +  call void @v1x4(<1 x float>* %ai_1, <1 x float>* %bi_1);
> > +  ret void
> > +}
> > +
> > +; Make sure we correctly output the vldr/vstr instructions if we ; do
> > +not convert them to vldm/vstm for autoincrement optimization
> > +;CHECK: addVec_1x4_roundtrip:
> > +define void @addVec_1x4_roundtrip() nounwind {
> > +  %bi = getelementptr inbounds [100 x <1 x float>]* @B1x4, i32 0, i32
> > +0
> > +;CHECK: vldr
> > +;CHECK-NOT: vldmia r1!
> > +  %v1 = load <1 x float>* %bi, align 16, !tbaa !0
> > +  %ai = getelementptr inbounds [100 x <1 x float>]* @A1x4, i32 0, i32
> > +0
> > +  %v2 = load <1 x float>* %ai, align 16, !tbaa !0
> > +  %v3 = fadd <1 x float> %v1, %v2
> > +;CHECK: vstr
> > +;CHECK-NOT: vstmia r0!
> > +  store <1 x float> %v3, <1 x float>* %ai, align 16, !tbaa !0
> > +  ret void
> > +}
> > +
> >  !0 = metadata !{metadata !"omnipotent char", metadata !1}
> >  !1 = metadata !{metadata !"Simple C/C++ TBAA"}
> > --
> > 1.7.8.3
> >
> 
> 
> On Sep 8, 2012, at 2:19 PM, David Peixotto <dpeixott at codeaurora.org>
> wrote:
> 
> > Ok, I've addressed the comments from the review. I've attached a new
> patch
> > to this email.
> >
> > I also fixed all the issues found by -verify-machineinstrs. Those fixes
> > belonged the in other patch in this patch set (see [PATCH 1/2] Improve
> > optimization to fold base update into LDM/STM), and the patch in this
> email
> > already depends on that other patch
> >
> > I've also uploaded a new version of the other patch. Once it is reviewed
> and
> > approved both patches are ready to be applied.
> >
> > Thanks,
> > Dave
> >
> > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted
> > by The Linux Foundation
> >
> >> -----Original Message-----
> >> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> >> bounces at cs.uiuc.edu] On Behalf Of David Peixotto
> >> Sent: Thursday, September 06, 2012 6:44 PM
> >> To: 'Jim Grosbach'
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: Re: [llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand
> for
> >> base update optimization
> >>
> >> Thanks for the feedback Jim!
> >>
> >> Thanks for the tip about -verify-machineinstrs. It found a few things
to
> >> complain about and I clean those up. I'll also apply your suggestions
> > below
> >> and submit a new patch. Thanks!
> >>
> >> -Dave
> >>
> >> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >> hosted by The Linux Foundation
> >>
> >> -----Original Message-----
> >> From: Jim Grosbach [mailto:grosbach at apple.com]
> >> Sent: Wednesday, September 05, 2012 5:21 PM
> >> To: David Peixotto
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: Re: [llvm-commits] [PATCH 2/2] Convert VLDR/VSTR on demand
> for
> >> base update optimization
> >>
> >> Nice!
> >>
> >> Style note:
> >> +static MachineInstr* convertVLSRtoVLSMIA(MachineBasicBlock &MBB,
> >> +                                         MachineInstr *MI,
> >> +                                         const TargetInstrInfo *TII) {
> >>
> >> The '{' should go on the preceding line, not on a new line. Similarly
for
> > the
> >> other functions.
> >>
> >>
> >> In convertVLSRtoVLSMIA:
> >>
> >> +  MachineBasicBlock::iterator MBBI(MI);  MachineInstrBuilder MIB =
> >> + BuildMI(MBB, MBBI, MI->getDebugLoc(),
> >> +                                    TII->get(NewOpc));
> >> +
> >> +  assert(MI->getOperand(2).isImm() && MI->getOperand(2).getImm()
> == 0
> >> &&
> >> +         "VLDR has non-zero pre-increment value");
> >> + MIB.addOperand(MI->getOperand(1)); // Base address
> >> + MIB.addOperand(MI->getOperand(3)); // Predicate Imm
> >> + MIB.addOperand(MI->getOperand(4)); // Predicate Register
> >> + MIB.addOperand(MI->getOperand(0)); // VPR register
> >>
> >> The VPR register should probably be marked as a def for loads.
> >>
> >> In convertVLSMIAtoVLSR:
> >>
> >> +  MachineBasicBlock::iterator MBBI(MI);  MachineInstrBuilder MIB =
> >> + BuildMI(MBB, MBBI, MI->getDebugLoc(),
> >> +                                    TII->get(NewOpc));
> >> +
> >> +  // There should be only one VPR register since this instruction was
> >> + // originally converted from a VLDR/VSTR
> >> + MIB.addOperand(MI->getOperand(3)); // VPR register
> >> + MIB.addOperand(MI->getOperand(0)); // Base address
> >> +  MIB.addImm(0);                     // Offset
> >> +  MIB.addOperand(MI->getOperand(1)); // Predicate Imm
> >> + MIB.addOperand(MI->getOperand(2)); // Predicate Register
> >> +
> >>
> >> The VPR register should definitely be marked as a def for loads.
> >>
> >> If you haven't already, make sure to run tests with the machine
verifier
> >> enabled after the pass runs. It'l help catch lots of things.
> >>
> >>
> >> LGTM with those updates.
> >>
> >> -Jim
> >>
> >>
> >> On Aug 31, 2012, at 11:02 AM, David Peixotto <dpeixott at codeaurora.org>
> >> wrote:
> >>
> >>> Please review the attached patch. Thanks!
> >>>
> >>> This commit allows the base-address update optimization to work for
> >>> VLDR and VSTR. There is no writeback version of VLDR/VSTR, so we
> >>> cannot directly fold the update of the base address into the
> >>> instruction itself. We must first transform the VLDR/VSTR to
> VLDM/VSTM
> >>> and then use the writeback version of those instructions. For example
> >>> we perform the following transformation,
> >>>
> >>> VLDR D0, [R0]
> >>> ADD  R0, R0, #8
> >>> ==>
> >>> VLDM R0!, {D0}
> >>>
> >>> The VLDR/VSTR instructions will only be changed to VLDM/VSTM if an
> >>> update of the base address register can be folded into the
> >>> instruction. Other VLDR/VSTR instructions are left undisturbed.
> >>>
> >>> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> >>> hosted by The Linux Foundation
> >>>
> >>>
> >>>
> >>> <0002-Convert-VLDR-VSTR-on-demand-for-base-update-
> >> optimiza.patch>_____
> >>> __________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > <0002-Convert-VLDR-VSTR-on-demand-for-base-update-optimiza.patch>