[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Fri Jul 27 08:35:21 PDT 2012

This patch has not yet been reviewed. Could someone take a look at it?

Tyler

> -----Original Message-----
> From: Nowicki, Tyler
> Sent: Tuesday, July 24, 2012 12:11 PM
> To: Sean Silva
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> 
> Thanks for the great code review!
> 
> About using the map, I didn't see an LLVM specific type which would work
> out well here. Rather the std::map looked to be the best fit. Please correct
> me if I'm mistaken.
> 
> Otherwise all suggested changes have been applied. Here is my updated
> patch.
> 
> Tyler
> 
> > -----Original Message-----
> > From: Sean Silva [mailto:silvas at purdue.edu]
> > Sent: Friday, July 20, 2012 4:09 PM
> > To: Nowicki, Tyler
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >
> > +  // Search for existing instance of Dividend and Divisor in DivBList
> > + // and replace with previously generated phi nodes
> > +
> > +  for(std::list<DivBNode>::iterator iselI = DivBList.begin();
> > +      iselI != DivBList.end(); iselI++) {
> > +    if (iselI->Dividend == Dividend && iselI->Divisor == Divisor) {
> > +      // Replace operation value with previously generated phi node
> > +      if (UseDivOp) {
> > +        J->replaceAllUsesWith(iselI->DivPhi);
> > +      } else {
> > +        J->replaceAllUsesWith(iselI->RemPhi);
> > +      }
> > +
> > +      // Advance to next operation
> > +      J++;
> > +
> > +      // Remove redundant operation
> > +      Instr->eraseFromParent();
> > +
> > +      return;
> > +    }
> > +  }
> >
> > Instead of doing a linear search here, use an appropriate set data
> > structure <http://llvm.org/docs/ProgrammersManual.html#ds_set>.
> > Otherwise, you run the risk of taking quadratic time, which will
> > usually result in someone filing a "this compilation never seems to
> > end" bug somewhere down the line.
> Done, except I didn't see an LLVM specific type which would work out well
> here. Rather the std::map looked to be the best fit.
> >
> > Now that I see that this is a cache that is just trying to reuse
> > previous results, you should probably call `replaceOrInsertFastDiv`
> > something like `reuseOrInsertFastDiv`.
> Done
> >
> > Now that I understand the structure of what this is doing a bit
> > better, you could probably make the behavior a bit clearer by
> > factoring out what is done on each basic block into a single static
> > function handleBasicBlock(F, I, DivBCache). You can then instantiate
> > DivBCache at the beginning of runOnFunction, and call clear after the call to
> handleBasicBlock, like this:
> >
> > bool X86LowerDiv::runOnFunction(Function &F) {
> >   DivBCacheTy PerBBDivBCache;
> >   for (Function::iterator I = F.begin(); I != F.end(); I++) {
> >     handleBasicBlock(F, I, PerBBDivBCache);
> >     PerBBDivBCache.clear();
> >   }
> > }
> Done
> >
> > The benefit of this is that it increases locality in the source;
> > instead of having the cache declared all the way up in the class
> > declaration, you can see it being declared right where it is used.
> > Ideally, PerBBDivBCache would be instantiated as a local variable in
> > `handleBasicBlock`, but as I'm sure you're aware, there is a
> > nontrivial performance penalty to performing the allocation for the
> > map container every time through a loop.
> Thanks!
> >
> > Also mentioned in the LLVM coding standards
> > <http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continu
> > e-
> > to-simplify-code>,
> > use a continue in the loop over BB's, so instead of
> >
> > if (Ty->isIntegerTy(32)) {
> >   /* indented stuff */
> > }
> >
> > instead prefer
> >
> > if (!J->getType()->isIntegerTy(32))
> >   continue;
> > /* stuff that is now not indented as much */
> Done
> >
> > I also moved in the J->getType() instead of the local variable `Ty`
> > because the local variable was adding two lines of code, which
> > decreases locality when reading and is otherwise distracting (in this
> > case, at least; naming things can help a lot in a lot of cases, but here the
> benefits seem marginal).
> Thanks!
> >
> > (btw, remember to choose "reply all" when replying, so that your
> > replies go to the list as well.)
> Yes sorry about that.
> >
> > --Sean Silva
> >
> > On Fri, Jul 20, 2012 at 2:02 PM, Nowicki, Tyler
> > <tyler.nowicki at intel.com>
> > wrote:
> > > Thank you for your reply and for catching my deviations from the
> > > coding
> > standard.
> > >
> > >  The suggested code you gave does not check for a div/rem, rather it
> > > only
> > identifies the Boolean details. With you changes and the extra check
> > it looks like this:
> > >
> > >       // Get instruction details
> > >       unsigned Opcode = J->getOpcode();
> > >       bool UseDivOp = Opcode == Instruction::SDiv || Opcode ==
> > Instruction::UDiv;
> > >       bool UseRemOp = Opcode == Instruction::SRem || Opcode ==
> > Instruction::URem;
> > >       bool UseSignedOp = Opcode == Instruction::SDiv || Opcode ==
> > > Instruction::SRem;
> > >
> > >       if (UseDivOp || UseRemOp) {
> > >         replaceOrInsertFastDiv(F, I, J, UseDivOp, UseSignedOp);
> > >       }
> > >
> > > Sinking the test would require the '//Get instruction details' code
> > > to be
> > added to both insertFastDiv and replaceOrInsertFastDiv, with some
> > omissions for unused variables. If that sounds like the right idea to
> > you then I'll make that change, otherwise I will pass the Boolean values.
> > >
> > > Would you suggest I duplicate the '//Get instruction details' code
> > > in each
> > function that needs the variables?
> > >
> > > Tyler
> > >
> > > -----Original Message-----
> > > From: Sean Silva [mailto:silvas at purdue.edu]
> > > Sent: Thursday, July 19, 2012 8:15 PM
> > > To: Nowicki, Tyler
> > > Cc: llvm-commits at cs.uiuc.edu
> > > Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> > >
> > > +                                Function::iterator& I,
> > > +                                BasicBlock::iterator& J,
> > > +                                bool UseDivOp, bool UseSignedOp);
> > > +  private:
> > > +    struct DivBNode {
> > > +      PHINode* DivPhi;
> > > +      PHINode* RemPhi;
> > >
> > > LLVM coding style <http://llvm.org/docs/CodingStandards.html> puts
> > > the
> > '*' or '&' on the right.
> > >
> > > +        if (J->getOpcode() == Instruction::SDiv) {
> > > +          replaceOrInsertFastDiv(F, I, J, true, true);
> > > +        } else if (J->getOpcode() == Instruction::UDiv) {
> > > +          replaceOrInsertFastDiv(F, I, J, true, false);
> > > +        } else if (J->getOpcode() == Instruction::SRem) {
> > > +          replaceOrInsertFastDiv(F, I, J, false, true);
> > > +        } else if (J->getOpcode() == Instruction::URem) {
> > > +          replaceOrInsertFastDiv(F, I, J, false, false);
> > > +        }
> > >
> > > This could probably be factored better to make it clearer. Maybe
> > > something like
> > >
> > > unsigned Opcode = J->getOpcode();
> > > bool UseDivOp = Opcode == Instruction::SDiv || Opcode ==
> > > Instruction::UDiv; bool UseSignedOp = Opcode == Instruction::SDiv ||
> > > Opcode == Instruction::SRem; replaceOrInsertFastDiv(F, I, J,
> > > UseDivOp, UseSignedOp);
> > >
> > > Or maybe since you're passing in J anyway, sink these tests into
> > replaceOrInsertFastDiv.
> > >
> > > +void X86LowerDiv::insertFastDiv(Function &F,
> > > +                                  Function::iterator& I,
> > >
> > > This could be aligned better.
> > >
> > > --Sean Silva
> > >
> > >
> > > On Thu, Jul 19, 2012 at 4:05 PM, Nowicki, Tyler
> > > <tyler.nowicki at intel.com>
> > wrote:
> > >> Hi,
> > >>
> > >> Here is an optimization for Intel Atom processors which uses a DIVB
> > instruction rather than an IDIV when both the dividend and divisor are
> > positive values less than 256. We've tested this with a number of
> > benchmark suites and it yields a positive performance improvement due
> > to the slowness of a 32-bit divide on Atom architectures.
> > >>
> > >> Commit message:
> > >> IDIV->DIVB optimization
> > >>   - Enabled only for Intel Atom with O2
> > >>   - Use DIVB instruction rather than IDIV when dividend and divisor
> > >> are
> > positive less than 256.
> > >>   - In the case when the quotient and remainder of a divide are
> > >> used a DIV
> > and a REM instruction will be present in the IR. In the non-Atom case
> > they are both lowered to IDIVs and CSE removes the redundant IDIV
> > instruction, using the quotient and remainder from the first IDIV.
> > However, due to this optimization CSE is not able to eliminate
> > redundant IDIV instructions because they are located in different
> > basic blocks. This is overcome by calculating both the quotient (DIV)
> > and remainder (REM) in each basic block that is inserted by the
> > optimization and reusing the result values when a subsequent DIV or REM
> instruction uses the same operands.
> > >>   - Test cases check for the optimization when calculating a
> > >> quotient,
> > remainder, or both.
> > >>
> > >> Tyler Nowicki
> > >> Intel
> > >>
> > >> _______________________________________________
> > >> llvm-commits mailing list
> > >> llvm-commits at cs.uiuc.edu
> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>