[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Fri Aug 3 15:30:56 PDT 2012

Let me clarify how this optimization works. Instead of

result = dividend/divisor;

an application programmer looking for optimal performance, targeting Atom CPUs, without this optimization would have to write:

if (((dividend | divisor) & 0xFF) < 0) {
  result = (char)dividend / (char)divisor; //fast divide 
} else {
 result = dividend / divisor; //slow divide
}

I still want the both the fast and slow divide to go through normal isel. But what you are suggesting significantly changes the way DIVs are handled for Atom CPUs, specifically avoiding the UDIVREM/SDIVREM expansion! Are you suggesting that I duplicate this code in a custom inserter function?

Tyler 

> -----Original Message-----
> From: Evan Cheng [mailto:evan.cheng at apple.com]
> Sent: Friday, August 03, 2012 6:07 PM
> To: Nowicki, Tyler
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> 
> 
> On Aug 3, 2012, at 2:28 PM, Nowicki, Tyler wrote:
> 
> > This optimization should not be classified as `instruction selection'. We're not
> simply replacing one form of divide with another at compilation time. The
> optimization adds control flow to use an 8-bit DIV when the values of the
> operands are determined to be small enough during program execution.
> 
> It is instruction selection. This is about selecting the optimal code sequence to
> implement an operation.
> 
> >
> > Several approaches were looked at for this optimization and the IR pass was
> found to be the best way to accomplish it. Here are the approaches that we
> looked at and the reasons they are not suitable for this optimization.
> >
> > 1. Using a custom lowering function
> > Specifying a custom lowering function for ISD::UDIV/UREM/etc instructions
> using setOperationAction(...) in X86ISelLowering. However, the lowering
> functions are limited to using a SelectionDAG. Because the optimization requires
> several new BBs to be added it is not possible to add it here. Is this the place you
> were thinking of?
> >
> > 2. Using a custom inserter
> > Marking appropriate instructions with usesCustomInserter in tablegen and
> implementing the optimization by inserting several new MachineBasicBlocks. As
> said previously ISD::UDIVREM/SDIVREM are expanded into multiple
> instructions, see X86ISelDAGToDAG:2301. Adding it here would require the
> inserted instructions to be identified, removed, and a similar set of instructions
> to be added for the fast and slow DIV cases. However, this cannot be easily done
> because there are no guarantees how UDIVREM/SDIVREM will be expanded or
> in what order, and inserting the fast and slow DIVs here would essentially
> require duplicating the UDIVREM/SDIVREM expansion code.
> 
> You want to use combination of 1 and 2. Custom lower into a target specific
> node which will map to a pseudo instruction marked with usesCustomInserter.
> That allows you to create new BBs. There are lots of examples, e.g. atomic ops.
> 
> Evan
> 
> >
> > 3. Other approaches
> > We want the UDIVREM/SDIVREM expansion to operate on the original DIV
> instruction and the DIV this optimization inserts so it seems like this optimization
> could be implemented in X86DAGToDAGISel::PreprocessISelDAG. Can we add
> control flow at this point?
> >
> > Tyler Nowicki
> > Intel
> >
> >> -----Original Message-----
> >> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >> Sent: Wednesday, August 01, 2012 4:38 PM
> >> To: Nowicki, Tyler
> >> Cc: llvm-commits at cs.uiuc.edu
> >> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>
> >>
> >> On Aug 1, 2012, at 11:58 AM, "Nowicki, Tyler"
> >> <tyler.nowicki at intel.com>
> >> wrote:
> >>
> >>> I don't understand what you are suggesting.
> >>>
> >>> ISD::SDIV, ISD::SREM and ISD::UDIV, ISD::UREM are combined to form
> >> ISD::SDIVREM, and ISD::UDIVREM. These are handled by
> >> X86ISelDAGToDAG:2301 which takes care of all the special cases by
> >> inserting the appropriate movs, cdq/cwd, and X86::DIV or X86::IDIV
> >> instructions. A custom inserter is invoked after this, but it's too
> >> late because the div/rem operations have been replaced by numerous
> instructions.
> >>
> >> Looks like you don't understand how LLVM instruction works. Please
> >> spend some time looking at the passes in isel, especially how targets
> >> do custom lowering. Basically you want to lower these ISD nodes
> >> before they reach X86ISelDAGToDAG.
> >>
> >>>
> >>> What is wrong with doing this in a backend IR pass?
> >>
> >> Because this is a target specific instruction selection issue and it
> >> doesn't belong in LLVM IR level. We don't want arbitrary IR passes to
> >> do bits and pieces of optimization.
> >>
> >> Evan
> >>
> >>>
> >>> Tyler
> >>>
> >>>> -----Original Message-----
> >>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>>> Sent: Wednesday, August 01, 2012 2:18 PM
> >>>> To: Nowicki, Tyler
> >>>> Cc: llvm-commits at cs.uiuc.edu
> >>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>
> >>>>
> >>>> On Aug 1, 2012, at 9:18 AM, "Nowicki, Tyler"
> >>>> <tyler.nowicki at intel.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks for the reply.
> >>>>>
> >>>>> Lower is too late because DIV gets expanded into 2 MOV, a CDQ or
> >>>>> CWD, the DIV or IDIV, and one or two more MOVs during instruction
> >>>>> selection on DAG conversion in X86ISelDAGToDAG. Adding
> >>>>
> >>>> Not during X86ISelDAGToDAG. You should do this using custom
> >>>> lowering in X86ISelLowering. At that time, it just looks like a ISD::SDIV /
> UDIV node.
> >>>>
> >>>>> this optimization after would require a custom inserter to search
> >>>>> and remove
> >>>> these instructions, and generate new BBs. This is complicated by O2
> >>>> optimization which may remove MOVs when possible, by the scheduler
> >>>> on Atom which changes the order of instructions to avoid stalls,
> >>>> and by which form of the DIV instruction is used. Also, depending
> >>>> on the spills and neighboring instructions it is difficult to
> >>>> enumerate all the possible ways this optimization can be carried out.
> >>>>
> >>>> The right thing to do is to custom lower into a X86ISD target node
> >>>> and select to a pseudo instruction. It can be marked with
> >>>> usesCustomInserter = 1 which allows you to expand it into the
> >>>> machine
> >> instructions and create new BBs if needed.
> >>>>
> >>>>>
> >>>>> It makes sense to do the optimization as an IR pass because none
> >>>>> of this has
> >>>> happened yet. Then CSE and scheduling can operate on the new BBs.
> >>>>
> >>>> Scheduling doesn't happen until later and machine CSE can handle
> >>>> these instructions. Please look at some examples to learn more
> >>>> about how existing targets handle complex cases.
> >>>>
> >>>> Evan
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Tyler
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>>>>> Sent: Tuesday, July 31, 2012 6:41 PM
> >>>>>> To: Nowicki, Tyler
> >>>>>> Cc: Eli Friedman; llvm-commits at cs.uiuc.edu
> >>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>>>
> >>>>>> Hi Tyler,
> >>>>>>
> >>>>>> This doesn't look like the right approach. I don't understand why
> >>>>>> you are adding a LLVM ir pass to do the expansion. Why can't this
> >>>>>> be done at x86 instruction lowering time?
> >>>>>>
> >>>>>> Evan
> >>>>>>
> >>>>>> On Jul 31, 2012, at 1:25 PM, "Nowicki, Tyler"
> >>>>>> <tyler.nowicki at intel.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for the review. Good catch on the missing check, I've
> >>>>>>> also made use
> >>>>>> of DenseMap as suggested. Test cases omitted from the previously
> >>>>>> patch are in place again and a test case for the missing check
> >>>>>> you found has been added.
> >>>>>>>
> >>>>>>> Please review my corrections.
> >>>>>>>
> >>>>>>> Tyler Nowicki
> >>>>>>> Intel
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Eli Friedman [mailto:eli.friedman at gmail.com]
> >>>>>>>> Sent: Friday, July 27, 2012 2:17 PM
> >>>>>>>> To: Nowicki, Tyler
> >>>>>>>> Cc: Sean Silva; llvm-commits at cs.uiuc.edu
> >>>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>>>>>
> >>>>>>>> On Fri, Jul 27, 2012 at 8:35 AM, Nowicki, Tyler
> >>>>>>>> <tyler.nowicki at intel.com>
> >>>>>>>> wrote:
> >>>>>>>>> This patch has not yet been reviewed. Could someone take a
> >>>>>>>>> look at
> >> it?
> >>>>>>>>
> >>>>>>>> +    typedef std::pair<Value *, Value *> DivBOperands;
> >>>>>>>> + //Dividend
> >>>>>>>> Value, Divisor Value
> >>>>>>>> +    typedef std::pair<PHINode *, PHINode *> DivBPhiNodes;
> >>>>>>>> + //Quotient
> >>>>>>>> PHI, Remainder Phi
> >>>>>>>> +    typedef std::map<DivBOperands, DivBPhiNodes> DivBCacheTy;
> >>>>>>>>
> >>>>>>>> llvm::DenseMap?
> >>>>>>>>
> >>>>>>>> +  // Replace operation value with previously generated phi
> >>>>>>>> + node DivBPhiNodes Value = CacheI->second;  if (UseDivOp) {
> >>>>>>>> +    // Replace all uses of div instruction with quotient phi node
> >>>>>>>> +    J->replaceAllUsesWith(Value.first);
> >>>>>>>> +  } else {
> >>>>>>>> +    // Replace all uses of rem instruction with remainder phi node
> >>>>>>>> +    J->replaceAllUsesWith(Value.second);
> >>>>>>>> +  }
> >>>>>>>>
> >>>>>>>> Looks like there's a missing check for UseSignedOp here.
> >>>>>>>>
> >>>>>>>> The patch doesn't appear to include any testcases.
> >>>>>>>>
> >>>>>>>> -Eli
> >>>>>>> <idivtodivb-
> >>>>>>
> svn.patch>_______________________________________________
> >>>>>>> llvm-commits mailing list
> >>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>
> >>>
> >