[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
Nowicki, Tyler
tyler.nowicki at intel.com
Tue Jul 24 09:10:49 PDT 2012
Thanks for the great code review!
About using the map, I didn't see an LLVM specific type which would work out well here. Rather the std::map looked to be the best fit. Please correct me if I'm mistaken.
Otherwise all suggested changes have been applied. Here is my updated patch.
Tyler
> -----Original Message-----
> From: Sean Silva [mailto:silvas at purdue.edu]
> Sent: Friday, July 20, 2012 4:09 PM
> To: Nowicki, Tyler
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
>
> + // Search for existing instance of Dividend and Divisor in DivBList
> + // and replace with previously generated phi nodes
> +
> + for(std::list<DivBNode>::iterator iselI = DivBList.begin();
> + iselI != DivBList.end(); iselI++) {
> + if (iselI->Dividend == Dividend && iselI->Divisor == Divisor) {
> + // Replace operation value with previously generated phi node
> + if (UseDivOp) {
> + J->replaceAllUsesWith(iselI->DivPhi);
> + } else {
> + J->replaceAllUsesWith(iselI->RemPhi);
> + }
> +
> + // Advance to next operation
> + J++;
> +
> + // Remove redundant operation
> + Instr->eraseFromParent();
> +
> + return;
> + }
> + }
>
> Instead of doing a linear search here, use an appropriate set data structure
> <http://llvm.org/docs/ProgrammersManual.html#ds_set>.
> Otherwise, you run the risk of taking quadratic time, which will usually result
> in someone filing a "this compilation never seems to end" bug somewhere
> down the line.
Done, except I didn't see an LLVM specific type which would work out well here. Rather the std::map looked to be the best fit.
>
> Now that I see that this is a cache that is just trying to reuse previous results,
> you should probably call `replaceOrInsertFastDiv` something like
> `reuseOrInsertFastDiv`.
Done
>
> Now that I understand the structure of what this is doing a bit better, you
> could probably make the behavior a bit clearer by factoring out what is done
> on each basic block into a single static function handleBasicBlock(F, I,
> DivBCache). You can then instantiate DivBCache at the beginning of
> runOnFunction, and call clear after the call to handleBasicBlock, like this:
>
> bool X86LowerDiv::runOnFunction(Function &F) {
> DivBCacheTy PerBBDivBCache;
> for (Function::iterator I = F.begin(); I != F.end(); I++) {
> handleBasicBlock(F, I, PerBBDivBCache);
> PerBBDivBCache.clear();
> }
> }
Done
>
> The benefit of this is that it increases locality in the source; instead of having
> the cache declared all the way up in the class declaration, you can see it being
> declared right where it is used.
> Ideally, PerBBDivBCache would be instantiated as a local variable in
> `handleBasicBlock`, but as I'm sure you're aware, there is a nontrivial
> performance penalty to performing the allocation for the map container
> every time through a loop.
Thanks!
>
> Also mentioned in the LLVM coding standards
> <http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-
> to-simplify-code>,
> use a continue in the loop over BB's, so instead of
>
> if (Ty->isIntegerTy(32)) {
> /* indented stuff */
> }
>
> instead prefer
>
> if (!J->getType()->isIntegerTy(32))
> continue;
> /* stuff that is now not indented as much */
Done
>
> I also moved in the J->getType() instead of the local variable `Ty` because the
> local variable was adding two lines of code, which decreases locality when
> reading and is otherwise distracting (in this case, at least; naming things can
> help a lot in a lot of cases, but here the benefits seem marginal).
Thanks!
>
> (btw, remember to choose "reply all" when replying, so that your replies go
> to the list as well.)
Yes sorry about that.
>
> --Sean Silva
>
> On Fri, Jul 20, 2012 at 2:02 PM, Nowicki, Tyler <tyler.nowicki at intel.com>
> wrote:
> > Thank you for your reply and for catching my deviations from the coding
> standard.
> >
> > The suggested code you gave does not check for a div/rem, rather it only
> identifies the Boolean details. With you changes and the extra check it looks
> like this:
> >
> > // Get instruction details
> > unsigned Opcode = J->getOpcode();
> > bool UseDivOp = Opcode == Instruction::SDiv || Opcode ==
> Instruction::UDiv;
> > bool UseRemOp = Opcode == Instruction::SRem || Opcode ==
> Instruction::URem;
> > bool UseSignedOp = Opcode == Instruction::SDiv || Opcode ==
> > Instruction::SRem;
> >
> > if (UseDivOp || UseRemOp) {
> > replaceOrInsertFastDiv(F, I, J, UseDivOp, UseSignedOp);
> > }
> >
> > Sinking the test would require the '//Get instruction details' code to be
> added to both insertFastDiv and replaceOrInsertFastDiv, with some
> omissions for unused variables. If that sounds like the right idea to you then
> I'll make that change, otherwise I will pass the Boolean values.
> >
> > Would you suggest I duplicate the '//Get instruction details' code in each
> function that needs the variables?
> >
> > Tyler
> >
> > -----Original Message-----
> > From: Sean Silva [mailto:silvas at purdue.edu]
> > Sent: Thursday, July 19, 2012 8:15 PM
> > To: Nowicki, Tyler
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >
> > + Function::iterator& I,
> > + BasicBlock::iterator& J,
> > + bool UseDivOp, bool UseSignedOp);
> > + private:
> > + struct DivBNode {
> > + PHINode* DivPhi;
> > + PHINode* RemPhi;
> >
> > LLVM coding style <http://llvm.org/docs/CodingStandards.html> puts the
> '*' or '&' on the right.
> >
> > + if (J->getOpcode() == Instruction::SDiv) {
> > + replaceOrInsertFastDiv(F, I, J, true, true);
> > + } else if (J->getOpcode() == Instruction::UDiv) {
> > + replaceOrInsertFastDiv(F, I, J, true, false);
> > + } else if (J->getOpcode() == Instruction::SRem) {
> > + replaceOrInsertFastDiv(F, I, J, false, true);
> > + } else if (J->getOpcode() == Instruction::URem) {
> > + replaceOrInsertFastDiv(F, I, J, false, false);
> > + }
> >
> > This could probably be factored better to make it clearer. Maybe
> > something like
> >
> > unsigned Opcode = J->getOpcode();
> > bool UseDivOp = Opcode == Instruction::SDiv || Opcode ==
> > Instruction::UDiv; bool UseSignedOp = Opcode == Instruction::SDiv ||
> > Opcode == Instruction::SRem; replaceOrInsertFastDiv(F, I, J, UseDivOp,
> > UseSignedOp);
> >
> > Or maybe since you're passing in J anyway, sink these tests into
> replaceOrInsertFastDiv.
> >
> > +void X86LowerDiv::insertFastDiv(Function &F,
> > + Function::iterator& I,
> >
> > This could be aligned better.
> >
> > --Sean Silva
> >
> >
> > On Thu, Jul 19, 2012 at 4:05 PM, Nowicki, Tyler <tyler.nowicki at intel.com>
> wrote:
> >> Hi,
> >>
> >> Here is an optimization for Intel Atom processors which uses a DIVB
> instruction rather than an IDIV when both the dividend and divisor are
> positive values less than 256. We've tested this with a number of benchmark
> suites and it yields a positive performance improvement due to the slowness
> of a 32-bit divide on Atom architectures.
> >>
> >> Commit message:
> >> IDIV->DIVB optimization
> >> - Enabled only for Intel Atom with O2
> >> - Use DIVB instruction rather than IDIV when dividend and divisor are
> positive less than 256.
> >> - In the case when the quotient and remainder of a divide are used a DIV
> and a REM instruction will be present in the IR. In the non-Atom case they
> are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
> using the quotient and remainder from the first IDIV. However, due to this
> optimization CSE is not able to eliminate redundant IDIV instructions because
> they are located in different basic blocks. This is overcome by calculating both
> the quotient (DIV) and remainder (REM) in each basic block that is inserted
> by the optimization and reusing the result values when a subsequent DIV or
> REM instruction uses the same operands.
> >> - Test cases check for the optimization when calculating a quotient,
> remainder, or both.
> >>
> >> Tyler Nowicki
> >> Intel
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: svnfastdiv.patch
Type: application/octet-stream
Size: 16935 bytes
Desc: svnfastdiv.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120724/1844cb14/attachment.obj>
More information about the llvm-commits
mailing list