[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Tue Jul 24 09:10:49 PDT 2012

Thanks for the great code review!

About using the map, I didn't see an LLVM specific type which would work out well here. Rather the std::map looked to be the best fit. Please correct me if I'm mistaken.

Otherwise all suggested changes have been applied. Here is my updated patch.

Tyler

> -----Original Message-----
> From: Sean Silva [mailto:silvas at purdue.edu]
> Sent: Friday, July 20, 2012 4:09 PM
> To: Nowicki, Tyler
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> 
> +  // Search for existing instance of Dividend and Divisor in DivBList
> + // and replace with previously generated phi nodes
> +
> +  for(std::list<DivBNode>::iterator iselI = DivBList.begin();
> +      iselI != DivBList.end(); iselI++) {
> +    if (iselI->Dividend == Dividend && iselI->Divisor == Divisor) {
> +      // Replace operation value with previously generated phi node
> +      if (UseDivOp) {
> +        J->replaceAllUsesWith(iselI->DivPhi);
> +      } else {
> +        J->replaceAllUsesWith(iselI->RemPhi);
> +      }
> +
> +      // Advance to next operation
> +      J++;
> +
> +      // Remove redundant operation
> +      Instr->eraseFromParent();
> +
> +      return;
> +    }
> +  }
> 
> Instead of doing a linear search here, use an appropriate set data structure
> <http://llvm.org/docs/ProgrammersManual.html#ds_set>.
> Otherwise, you run the risk of taking quadratic time, which will usually result
> in someone filing a "this compilation never seems to end" bug somewhere
> down the line.
Done, except I didn't see an LLVM specific type which would work out well here. Rather the std::map looked to be the best fit.
> 
> Now that I see that this is a cache that is just trying to reuse previous results,
> you should probably call `replaceOrInsertFastDiv` something like
> `reuseOrInsertFastDiv`.
Done
> 
> Now that I understand the structure of what this is doing a bit better, you
> could probably make the behavior a bit clearer by factoring out what is done
> on each basic block into a single static function handleBasicBlock(F, I,
> DivBCache). You can then instantiate DivBCache at the beginning of
> runOnFunction, and call clear after the call to handleBasicBlock, like this:
> 
> bool X86LowerDiv::runOnFunction(Function &F) {
>   DivBCacheTy PerBBDivBCache;
>   for (Function::iterator I = F.begin(); I != F.end(); I++) {
>     handleBasicBlock(F, I, PerBBDivBCache);
>     PerBBDivBCache.clear();
>   }
> }
Done
> 
> The benefit of this is that it increases locality in the source; instead of having
> the cache declared all the way up in the class declaration, you can see it being
> declared right where it is used.
> Ideally, PerBBDivBCache would be instantiated as a local variable in
> `handleBasicBlock`, but as I'm sure you're aware, there is a nontrivial
> performance penalty to performing the allocation for the map container
> every time through a loop.
Thanks!
> 
> Also mentioned in the LLVM coding standards
> <http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-
> to-simplify-code>,
> use a continue in the loop over BB's, so instead of
> 
> if (Ty->isIntegerTy(32)) {
>   /* indented stuff */
> }
> 
> instead prefer
> 
> if (!J->getType()->isIntegerTy(32))
>   continue;
> /* stuff that is now not indented as much */
Done 
> 
> I also moved in the J->getType() instead of the local variable `Ty` because the
> local variable was adding two lines of code, which decreases locality when
> reading and is otherwise distracting (in this case, at least; naming things can
> help a lot in a lot of cases, but here the benefits seem marginal).
Thanks!
> 
> (btw, remember to choose "reply all" when replying, so that your replies go
> to the list as well.)
Yes sorry about that.
> 
> --Sean Silva
> 
> On Fri, Jul 20, 2012 at 2:02 PM, Nowicki, Tyler <tyler.nowicki at intel.com>
> wrote:
> > Thank you for your reply and for catching my deviations from the coding
> standard.
> >
> >  The suggested code you gave does not check for a div/rem, rather it only
> identifies the Boolean details. With you changes and the extra check it looks
> like this:
> >
> >       // Get instruction details
> >       unsigned Opcode = J->getOpcode();
> >       bool UseDivOp = Opcode == Instruction::SDiv || Opcode ==
> Instruction::UDiv;
> >       bool UseRemOp = Opcode == Instruction::SRem || Opcode ==
> Instruction::URem;
> >       bool UseSignedOp = Opcode == Instruction::SDiv || Opcode ==
> > Instruction::SRem;
> >
> >       if (UseDivOp || UseRemOp) {
> >         replaceOrInsertFastDiv(F, I, J, UseDivOp, UseSignedOp);
> >       }
> >
> > Sinking the test would require the '//Get instruction details' code to be
> added to both insertFastDiv and replaceOrInsertFastDiv, with some
> omissions for unused variables. If that sounds like the right idea to you then
> I'll make that change, otherwise I will pass the Boolean values.
> >
> > Would you suggest I duplicate the '//Get instruction details' code in each
> function that needs the variables?
> >
> > Tyler
> >
> > -----Original Message-----
> > From: Sean Silva [mailto:silvas at purdue.edu]
> > Sent: Thursday, July 19, 2012 8:15 PM
> > To: Nowicki, Tyler
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >
> > +                                Function::iterator& I,
> > +                                BasicBlock::iterator& J,
> > +                                bool UseDivOp, bool UseSignedOp);
> > +  private:
> > +    struct DivBNode {
> > +      PHINode* DivPhi;
> > +      PHINode* RemPhi;
> >
> > LLVM coding style <http://llvm.org/docs/CodingStandards.html> puts the
> '*' or '&' on the right.
> >
> > +        if (J->getOpcode() == Instruction::SDiv) {
> > +          replaceOrInsertFastDiv(F, I, J, true, true);
> > +        } else if (J->getOpcode() == Instruction::UDiv) {
> > +          replaceOrInsertFastDiv(F, I, J, true, false);
> > +        } else if (J->getOpcode() == Instruction::SRem) {
> > +          replaceOrInsertFastDiv(F, I, J, false, true);
> > +        } else if (J->getOpcode() == Instruction::URem) {
> > +          replaceOrInsertFastDiv(F, I, J, false, false);
> > +        }
> >
> > This could probably be factored better to make it clearer. Maybe
> > something like
> >
> > unsigned Opcode = J->getOpcode();
> > bool UseDivOp = Opcode == Instruction::SDiv || Opcode ==
> > Instruction::UDiv; bool UseSignedOp = Opcode == Instruction::SDiv ||
> > Opcode == Instruction::SRem; replaceOrInsertFastDiv(F, I, J, UseDivOp,
> > UseSignedOp);
> >
> > Or maybe since you're passing in J anyway, sink these tests into
> replaceOrInsertFastDiv.
> >
> > +void X86LowerDiv::insertFastDiv(Function &F,
> > +                                  Function::iterator& I,
> >
> > This could be aligned better.
> >
> > --Sean Silva
> >
> >
> > On Thu, Jul 19, 2012 at 4:05 PM, Nowicki, Tyler <tyler.nowicki at intel.com>
> wrote:
> >> Hi,
> >>
> >> Here is an optimization for Intel Atom processors which uses a DIVB
> instruction rather than an IDIV when both the dividend and divisor are
> positive values less than 256. We've tested this with a number of benchmark
> suites and it yields a positive performance improvement due to the slowness
> of a 32-bit divide on Atom architectures.
> >>
> >> Commit message:
> >> IDIV->DIVB optimization
> >>   - Enabled only for Intel Atom with O2
> >>   - Use DIVB instruction rather than IDIV when dividend and divisor are
> positive less than 256.
> >>   - In the case when the quotient and remainder of a divide are used a DIV
> and a REM instruction will be present in the IR. In the non-Atom case they
> are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
> using the quotient and remainder from the first IDIV. However, due to this
> optimization CSE is not able to eliminate redundant IDIV instructions because
> they are located in different basic blocks. This is overcome by calculating both
> the quotient (DIV) and remainder (REM) in each basic block that is inserted
> by the optimization and reusing the result values when a subsequent DIV or
> REM instruction uses the same operands.
> >>   - Test cases check for the optimization when calculating a quotient,
> remainder, or both.
> >>
> >> Tyler Nowicki
> >> Intel
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svnfastdiv.patch
Type: application/octet-stream
Size: 16935 bytes
Desc: svnfastdiv.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120724/1844cb14/attachment.obj>