[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Thu Jul 19 17:15:03 PDT 2012

+                                Function::iterator& I,
+                                BasicBlock::iterator& J,
+                                bool UseDivOp, bool UseSignedOp);
+  private:
+    struct DivBNode {
+      PHINode* DivPhi;
+      PHINode* RemPhi;

LLVM coding style <http://llvm.org/docs/CodingStandards.html> puts the
'*' or '&' on the right.

+        if (J->getOpcode() == Instruction::SDiv) {
+          replaceOrInsertFastDiv(F, I, J, true, true);
+        } else if (J->getOpcode() == Instruction::UDiv) {
+          replaceOrInsertFastDiv(F, I, J, true, false);
+        } else if (J->getOpcode() == Instruction::SRem) {
+          replaceOrInsertFastDiv(F, I, J, false, true);
+        } else if (J->getOpcode() == Instruction::URem) {
+          replaceOrInsertFastDiv(F, I, J, false, false);
+        }

This could probably be factored better to make it clearer. Maybe something like

unsigned Opcode = J->getOpcode();
bool UseDivOp = Opcode == Instruction::SDiv || Opcode == Instruction::UDiv;
bool UseSignedOp = Opcode == Instruction::SDiv || Opcode == Instruction::SRem;
replaceOrInsertFastDiv(F, I, J, UseDivOp, UseSignedOp);

Or maybe since you're passing in J anyway, sink these tests into
replaceOrInsertFastDiv.

+void X86LowerDiv::insertFastDiv(Function &F,
+                                  Function::iterator& I,

This could be aligned better.

--Sean Silva

On Thu, Jul 19, 2012 at 4:05 PM, Nowicki, Tyler <tyler.nowicki at intel.com> wrote:
> Hi,
>
> Here is an optimization for Intel Atom processors which uses a DIVB instruction rather than an IDIV when both the dividend and divisor are positive values less than 256. We've tested this with a number of benchmark suites and it yields a positive performance improvement due to the slowness of a 32-bit divide on Atom architectures.
>
> Commit message:
> IDIV->DIVB optimization
>   - Enabled only for Intel Atom with O2
>   - Use DIVB instruction rather than IDIV when dividend and divisor are positive less than 256.
>   - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands.
>   - Test cases check for the optimization when calculating a quotient, remainder, or both.
>
> Tyler Nowicki
> Intel
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>