[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Nowicki, Tyler tyler.nowicki at intel.com
Mon Aug 13 07:20:36 PDT 2012


Sorry for the delayed response. This sounds like the right place. I've attached a patch for a CodeGenPrep pass that bypasses slow division operations when the target indicates that smaller divisions are cheaper.

Thanks,

Tyler

> -----Original Message-----
> From: Evan Cheng [mailto:evan.cheng at apple.com]
> Sent: Sunday, August 05, 2012 7:01 PM
> To: Nowicki, Tyler
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> 
> Another possibility would be to make this transformation generic. It might
> make sense to put it in CodeGenPrep under control of some target hook.
> CodeGenPrep is the place where IR is being modified before it goes through
> codegen and it is allowed to do have some target specific knowledge.
> 
> Evan
> 
> On Aug 5, 2012, at 11:10 AM, Evan Cheng wrote:
> 
> >
> > On Aug 3, 2012, at 3:30 PM, Nowicki, Tyler wrote:
> >
> >> Let me clarify how this optimization works. Instead of
> >>
> >> result = dividend/divisor;
> >>
> >> an application programmer looking for optimal performance, targeting
> Atom CPUs, without this optimization would have to write:
> >>
> >> if (((dividend | divisor) & 0xFF) < 0) { result = (char)dividend /
> >> (char)divisor; //fast divide } else { result = dividend / divisor;
> >> //slow divide }
> >
> > Yes I understand. You want a simple division to be expanded into code that
> involves multiple BB and control flows.
> >
> >>
> >> I still want the both the fast and slow divide to go through normal isel. But
> what you are suggesting significantly changes the way DIVs are handled for
> Atom CPUs, specifically avoiding the UDIVREM/SDIVREM expansion! Are you
> suggesting that I duplicate this code in a custom inserter function?
> >
> > No, you would custom lower sdiv / udiv / srem / urem nodes so UDIVREM /
> SDIVREM code will never be formed.  The custom inserter function create
> MachineInstr, not SDNodes.
> >
> > Evan
> >
> >>
> >> Tyler
> >>
> >>> -----Original Message-----
> >>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>> Sent: Friday, August 03, 2012 6:07 PM
> >>> To: Nowicki, Tyler
> >>> Cc: llvm-commits at cs.uiuc.edu
> >>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>
> >>>
> >>> On Aug 3, 2012, at 2:28 PM, Nowicki, Tyler wrote:
> >>>
> >>>> This optimization should not be classified as `instruction
> >>>> selection'. We're not
> >>> simply replacing one form of divide with another at compilation
> >>> time. The optimization adds control flow to use an 8-bit DIV when
> >>> the values of the operands are determined to be small enough during
> program execution.
> >>>
> >>> It is instruction selection. This is about selecting the optimal
> >>> code sequence to implement an operation.
> >>>
> >>>>
> >>>> Several approaches were looked at for this optimization and the IR
> >>>> pass was
> >>> found to be the best way to accomplish it. Here are the approaches
> >>> that we looked at and the reasons they are not suitable for this
> optimization.
> >>>>
> >>>> 1. Using a custom lowering function Specifying a custom lowering
> >>>> function for ISD::UDIV/UREM/etc instructions
> >>> using setOperationAction(...) in X86ISelLowering. However, the
> >>> lowering functions are limited to using a SelectionDAG. Because the
> >>> optimization requires several new BBs to be added it is not possible
> >>> to add it here. Is this the place you were thinking of?
> >>>>
> >>>> 2. Using a custom inserter
> >>>> Marking appropriate instructions with usesCustomInserter in
> >>>> tablegen and
> >>> implementing the optimization by inserting several new
> >>> MachineBasicBlocks. As said previously ISD::UDIVREM/SDIVREM are
> >>> expanded into multiple instructions, see X86ISelDAGToDAG:2301.
> >>> Adding it here would require the inserted instructions to be
> >>> identified, removed, and a similar set of instructions to be added
> >>> for the fast and slow DIV cases. However, this cannot be easily done
> >>> because there are no guarantees how UDIVREM/SDIVREM will be
> expanded
> >>> or in what order, and inserting the fast and slow DIVs here would
> essentially require duplicating the UDIVREM/SDIVREM expansion code.
> >>>
> >>> You want to use combination of 1 and 2. Custom lower into a target
> >>> specific node which will map to a pseudo instruction marked with
> usesCustomInserter.
> >>> That allows you to create new BBs. There are lots of examples, e.g.
> atomic ops.
> >>>
> >>> Evan
> >>>
> >>>>
> >>>> 3. Other approaches
> >>>> We want the UDIVREM/SDIVREM expansion to operate on the original
> >>>> DIV
> >>> instruction and the DIV this optimization inserts so it seems like
> >>> this optimization could be implemented in
> >>> X86DAGToDAGISel::PreprocessISelDAG. Can we add control flow at this
> point?
> >>>>
> >>>> Tyler Nowicki
> >>>> Intel
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>>>> Sent: Wednesday, August 01, 2012 4:38 PM
> >>>>> To: Nowicki, Tyler
> >>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>>
> >>>>>
> >>>>> On Aug 1, 2012, at 11:58 AM, "Nowicki, Tyler"
> >>>>> <tyler.nowicki at intel.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I don't understand what you are suggesting.
> >>>>>>
> >>>>>> ISD::SDIV, ISD::SREM and ISD::UDIV, ISD::UREM are combined to
> >>>>>> form
> >>>>> ISD::SDIVREM, and ISD::UDIVREM. These are handled by
> >>>>> X86ISelDAGToDAG:2301 which takes care of all the special cases by
> >>>>> inserting the appropriate movs, cdq/cwd, and X86::DIV or X86::IDIV
> >>>>> instructions. A custom inserter is invoked after this, but it's
> >>>>> too late because the div/rem operations have been replaced by
> >>>>> numerous
> >>> instructions.
> >>>>>
> >>>>> Looks like you don't understand how LLVM instruction works. Please
> >>>>> spend some time looking at the passes in isel, especially how
> >>>>> targets do custom lowering. Basically you want to lower these ISD
> >>>>> nodes before they reach X86ISelDAGToDAG.
> >>>>>
> >>>>>>
> >>>>>> What is wrong with doing this in a backend IR pass?
> >>>>>
> >>>>> Because this is a target specific instruction selection issue and
> >>>>> it doesn't belong in LLVM IR level. We don't want arbitrary IR
> >>>>> passes to do bits and pieces of optimization.
> >>>>>
> >>>>> Evan
> >>>>>
> >>>>>>
> >>>>>> Tyler
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>>>>>> Sent: Wednesday, August 01, 2012 2:18 PM
> >>>>>>> To: Nowicki, Tyler
> >>>>>>> Cc: llvm-commits at cs.uiuc.edu
> >>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >>>>>>>
> >>>>>>>
> >>>>>>> On Aug 1, 2012, at 9:18 AM, "Nowicki, Tyler"
> >>>>>>> <tyler.nowicki at intel.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for the reply.
> >>>>>>>>
> >>>>>>>> Lower is too late because DIV gets expanded into 2 MOV, a CDQ
> >>>>>>>> or CWD, the DIV or IDIV, and one or two more MOVs during
> >>>>>>>> instruction selection on DAG conversion in X86ISelDAGToDAG.
> >>>>>>>> Adding
> >>>>>>>
> >>>>>>> Not during X86ISelDAGToDAG. You should do this using custom
> >>>>>>> lowering in X86ISelLowering. At that time, it just looks like a
> >>>>>>> ISD::SDIV /
> >>> UDIV node.
> >>>>>>>
> >>>>>>>> this optimization after would require a custom inserter to
> >>>>>>>> search and remove
> >>>>>>> these instructions, and generate new BBs. This is complicated by
> >>>>>>> O2 optimization which may remove MOVs when possible, by the
> >>>>>>> scheduler on Atom which changes the order of instructions to
> >>>>>>> avoid stalls, and by which form of the DIV instruction is used.
> >>>>>>> Also, depending on the spills and neighboring instructions it is
> >>>>>>> difficult to enumerate all the possible ways this optimization can be
> carried out.
> >>>>>>>
> >>>>>>> The right thing to do is to custom lower into a X86ISD target
> >>>>>>> node and select to a pseudo instruction. It can be marked with
> >>>>>>> usesCustomInserter = 1 which allows you to expand it into the
> >>>>>>> machine
> >>>>> instructions and create new BBs if needed.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> It makes sense to do the optimization as an IR pass because
> >>>>>>>> none of this has
> >>>>>>> happened yet. Then CSE and scheduling can operate on the new
> BBs.
> >>>>>>>
> >>>>>>> Scheduling doesn't happen until later and machine CSE can handle
> >>>>>>> these instructions. Please look at some examples to learn more
> >>>>>>> about how existing targets handle complex cases.
> >>>>>>>
> >>>>>>> Evan
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Tyler
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> >>>>>>>>> Sent: Tuesday, July 31, 2012 6:41 PM
> >>>>>>>>> To: Nowicki, Tyler
> >>>>>>>>> Cc: Eli Friedman; llvm-commits at cs.uiuc.edu
> >>>>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom
> >>>>>>>>> Optimization
> >>>>>>>>>
> >>>>>>>>> Hi Tyler,
> >>>>>>>>>
> >>>>>>>>> This doesn't look like the right approach. I don't understand
> >>>>>>>>> why you are adding a LLVM ir pass to do the expansion. Why
> >>>>>>>>> can't this be done at x86 instruction lowering time?
> >>>>>>>>>
> >>>>>>>>> Evan
> >>>>>>>>>
> >>>>>>>>> On Jul 31, 2012, at 1:25 PM, "Nowicki, Tyler"
> >>>>>>>>> <tyler.nowicki at intel.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks for the review. Good catch on the missing check, I've
> >>>>>>>>>> also made use
> >>>>>>>>> of DenseMap as suggested. Test cases omitted from the
> >>>>>>>>> previously patch are in place again and a test case for the
> >>>>>>>>> missing check you found has been added.
> >>>>>>>>>>
> >>>>>>>>>> Please review my corrections.
> >>>>>>>>>>
> >>>>>>>>>> Tyler Nowicki
> >>>>>>>>>> Intel
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Eli Friedman [mailto:eli.friedman at gmail.com]
> >>>>>>>>>>> Sent: Friday, July 27, 2012 2:17 PM
> >>>>>>>>>>> To: Nowicki, Tyler
> >>>>>>>>>>> Cc: Sean Silva; llvm-commits at cs.uiuc.edu
> >>>>>>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom
> >>>>>>>>>>> Optimization
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jul 27, 2012 at 8:35 AM, Nowicki, Tyler
> >>>>>>>>>>> <tyler.nowicki at intel.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> This patch has not yet been reviewed. Could someone take a
> >>>>>>>>>>>> look at
> >>>>> it?
> >>>>>>>>>>>
> >>>>>>>>>>> +    typedef std::pair<Value *, Value *> DivBOperands;
> >>>>>>>>>>> + //Dividend
> >>>>>>>>>>> Value, Divisor Value
> >>>>>>>>>>> +    typedef std::pair<PHINode *, PHINode *> DivBPhiNodes;
> >>>>>>>>>>> + //Quotient
> >>>>>>>>>>> PHI, Remainder Phi
> >>>>>>>>>>> +    typedef std::map<DivBOperands, DivBPhiNodes>
> >>>>>>>>>>> + DivBCacheTy;
> >>>>>>>>>>>
> >>>>>>>>>>> llvm::DenseMap?
> >>>>>>>>>>>
> >>>>>>>>>>> +  // Replace operation value with previously generated phi
> >>>>>>>>>>> + node DivBPhiNodes Value = CacheI->second;  if (UseDivOp) {
> >>>>>>>>>>> +    // Replace all uses of div instruction with quotient phi node
> >>>>>>>>>>> +    J->replaceAllUsesWith(Value.first);
> >>>>>>>>>>> +  } else {
> >>>>>>>>>>> +    // Replace all uses of rem instruction with remainder phi
> node
> >>>>>>>>>>> +    J->replaceAllUsesWith(Value.second);
> >>>>>>>>>>> +  }
> >>>>>>>>>>>
> >>>>>>>>>>> Looks like there's a missing check for UseSignedOp here.
> >>>>>>>>>>>
> >>>>>>>>>>> The patch doesn't appear to include any testcases.
> >>>>>>>>>>>
> >>>>>>>>>>> -Eli
> >>>>>>>>>> <idivtodivb-
> >>>>>>>>>
> >>> svn.patch>_______________________________________________
> >>>>>>>>>> llvm-commits mailing list
> >>>>>>>>>> llvm-commits at cs.uiuc.edu
> >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
A non-text attachment was scrubbed...
Name: BypassSlowDiv5.patch
Type: application/octet-stream
Size: 21214 bytes
Desc: BypassSlowDiv5.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120813/7d437b40/attachment.obj>


More information about the llvm-commits mailing list