[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Fri Aug 3 14:28:11 PDT 2012

This optimization should not be classified as `instruction selection'. We're not simply replacing one form of divide with another at compilation time. The optimization adds control flow to use an 8-bit DIV when the values of the operands are determined to be small enough during program execution.

Several approaches were looked at for this optimization and the IR pass was found to be the best way to accomplish it. Here are the approaches that we looked at and the reasons they are not suitable for this optimization. 

1. Using a custom lowering function
Specifying a custom lowering function for ISD::UDIV/UREM/etc instructions using setOperationAction(...) in X86ISelLowering. However, the lowering functions are limited to using a SelectionDAG. Because the optimization requires several new BBs to be added it is not possible to add it here. Is this the place you were thinking of?

2. Using a custom inserter
Marking appropriate instructions with usesCustomInserter in tablegen and implementing the optimization by inserting several new MachineBasicBlocks. As said previously ISD::UDIVREM/SDIVREM are expanded into multiple instructions, see X86ISelDAGToDAG:2301. Adding it here would require the inserted instructions to be identified, removed, and a similar set of instructions to be added for the fast and slow DIV cases. However, this cannot be easily done because there are no guarantees how UDIVREM/SDIVREM will be expanded or in what order, and inserting the fast and slow DIVs here would essentially require duplicating the UDIVREM/SDIVREM expansion code.

3. Other approaches
We want the UDIVREM/SDIVREM expansion to operate on the original DIV instruction and the DIV this optimization inserts so it seems like this optimization could be implemented in X86DAGToDAGISel::PreprocessISelDAG. Can we add control flow at this point?

Tyler Nowicki
Intel

> -----Original Message-----
> From: Evan Cheng [mailto:evan.cheng at apple.com]
> Sent: Wednesday, August 01, 2012 4:38 PM
> To: Nowicki, Tyler
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> 
> 
> On Aug 1, 2012, at 11:58 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com>
> wrote:
> 
> > I don't understand what you are suggesting.
> >
> > ISD::SDIV, ISD::SREM and ISD::UDIV, ISD::UREM are combined to form
> ISD::SDIVREM, and ISD::UDIVREM. These are handled by
> X86ISelDAGToDAG:2301 which takes care of all the special cases by inserting
> the appropriate movs, cdq/cwd, and X86::DIV or X86::IDIV instructions. A
> custom inserter is invoked after this, but it's too late because the div/rem
> operations have been replaced by numerous instructions.
> 
> Looks like you don't understand how LLVM instruction works. Please spend
> some time looking at the passes in isel, especially how targets do custom
> lowering. Basically you want to lower these ISD nodes before they reach
> X86ISelDAGToDAG.
> 
> >
> > What is wrong with doing this in a backend IR pass?
> 
> Because this is a target specific instruction selection issue and it doesn't belong
> in LLVM IR level. We don't want arbitrary IR passes to do bits and pieces of
> optimization.
> 
> Evan
> 
> >
> > Tyler
> >
> >> -----Original Message-----
> >> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >> Sent: Wednesday, August 01, 2012 2:18 PM
> >> To: Nowicki, Tyler
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>
> >>
> >> On Aug 1, 2012, at 9:18 AM, "Nowicki, Tyler"
> >> <tyler.nowicki at intel.com>
> >> wrote:
> >>
> >>> Thanks for the reply.
> >>>
> >>> Lower is too late because DIV gets expanded into 2 MOV, a CDQ or
> >>> CWD, the DIV or IDIV, and one or two more MOVs during instruction
> >>> selection on DAG conversion in X86ISelDAGToDAG. Adding
> >>
> >> Not during X86ISelDAGToDAG. You should do this using custom lowering
> >> in X86ISelLowering. At that time, it just looks like a ISD::SDIV / UDIV node.
> >>
> >>> this optimization after would require a custom inserter to search
> >>> and remove
> >> these instructions, and generate new BBs. This is complicated by O2
> >> optimization which may remove MOVs when possible, by the scheduler on
> >> Atom which changes the order of instructions to avoid stalls, and by
> >> which form of the DIV instruction is used. Also, depending on the
> >> spills and neighboring instructions it is difficult to enumerate all
> >> the possible ways this optimization can be carried out.
> >>
> >> The right thing to do is to custom lower into a X86ISD target node
> >> and select to a pseudo instruction. It can be marked with
> >> usesCustomInserter = 1 which allows you to expand it into the machine
> instructions and create new BBs if needed.
> >>
> >>>
> >>> It makes sense to do the optimization as an IR pass because none of
> >>> this has
> >> happened yet. Then CSE and scheduling can operate on the new BBs.
> >>
> >> Scheduling doesn't happen until later and machine CSE can handle
> >> these instructions. Please look at some examples to learn more about
> >> how existing targets handle complex cases.
> >>
> >> Evan
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Tyler
> >>>
> >>>> -----Original Message-----
> >>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>>> Sent: Tuesday, July 31, 2012 6:41 PM
> >>>> To: Nowicki, Tyler
> >>>> Cc: Eli Friedman; llvm-commits at cs.uiuc.edu
> >>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>
> >>>> Hi Tyler,
> >>>>
> >>>> This doesn't look like the right approach. I don't understand why
> >>>> you are adding a LLVM ir pass to do the expansion. Why can't this
> >>>> be done at x86 instruction lowering time?
> >>>>
> >>>> Evan
> >>>>
> >>>> On Jul 31, 2012, at 1:25 PM, "Nowicki, Tyler"
> >>>> <tyler.nowicki at intel.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks for the review. Good catch on the missing check, I've also
> >>>>> made use
> >>>> of DenseMap as suggested. Test cases omitted from the previously
> >>>> patch are in place again and a test case for the missing check you
> >>>> found has been added.
> >>>>>
> >>>>> Please review my corrections.
> >>>>>
> >>>>> Tyler Nowicki
> >>>>> Intel
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Eli Friedman [mailto:eli.friedman at gmail.com]
> >>>>>> Sent: Friday, July 27, 2012 2:17 PM
> >>>>>> To: Nowicki, Tyler
> >>>>>> Cc: Sean Silva; llvm-commits at cs.uiuc.edu
> >>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>>>
> >>>>>> On Fri, Jul 27, 2012 at 8:35 AM, Nowicki, Tyler
> >>>>>> <tyler.nowicki at intel.com>
> >>>>>> wrote:
> >>>>>>> This patch has not yet been reviewed. Could someone take a look at
> it?
> >>>>>>
> >>>>>> +    typedef std::pair<Value *, Value *> DivBOperands; //Dividend
> >>>>>> Value, Divisor Value
> >>>>>> +    typedef std::pair<PHINode *, PHINode *> DivBPhiNodes;
> >>>>>> + //Quotient
> >>>>>> PHI, Remainder Phi
> >>>>>> +    typedef std::map<DivBOperands, DivBPhiNodes> DivBCacheTy;
> >>>>>>
> >>>>>> llvm::DenseMap?
> >>>>>>
> >>>>>> +  // Replace operation value with previously generated phi node
> >>>>>> + DivBPhiNodes Value = CacheI->second;  if (UseDivOp) {
> >>>>>> +    // Replace all uses of div instruction with quotient phi node
> >>>>>> +    J->replaceAllUsesWith(Value.first);
> >>>>>> +  } else {
> >>>>>> +    // Replace all uses of rem instruction with remainder phi node
> >>>>>> +    J->replaceAllUsesWith(Value.second);
> >>>>>> +  }
> >>>>>>
> >>>>>> Looks like there's a missing check for UseSignedOp here.
> >>>>>>
> >>>>>> The patch doesn't appear to include any testcases.
> >>>>>>
> >>>>>> -Eli
> >>>>> <idivtodivb-
> >>>> svn.patch>_______________________________________________
> >>>>> llvm-commits mailing list
> >>>>> llvm-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >