[llvm-commits] [PATCH]IDIV->DIVB Atom Optimization

Mon Aug 13 07:35:51 PDT 2012

My apologies, that was not an SVN patch. Here is the correct patch.

Tyler

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Nowicki, Tyler
> Sent: Monday, August 13, 2012 10:21 AM
> To: Evan Cheng
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> 
> Sorry for the delayed response. This sounds like the right place. I've attached
> a patch for a CodeGenPrep pass that bypasses slow division operations when
> the target indicates that smaller divisions are cheaper.
> 
> Thanks,
> 
> Tyler
> 
> > -----Original Message-----
> > From: Evan Cheng [mailto:evan.cheng at apple.com]
> > Sent: Sunday, August 05, 2012 7:01 PM
> > To: Nowicki, Tyler
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> >
> > Another possibility would be to make this transformation generic. It
> > might make sense to put it in CodeGenPrep under control of some target
> hook.
> > CodeGenPrep is the place where IR is being modified before it goes
> > through codegen and it is allowed to do have some target specific
> knowledge.
> >
> > Evan
> >
> > On Aug 5, 2012, at 11:10 AM, Evan Cheng wrote:
> >
> > >
> > > On Aug 3, 2012, at 3:30 PM, Nowicki, Tyler wrote:
> > >
> > >> Let me clarify how this optimization works. Instead of
> > >>
> > >> result = dividend/divisor;
> > >>
> > >> an application programmer looking for optimal performance,
> > >> targeting
> > Atom CPUs, without this optimization would have to write:
> > >>
> > >> if (((dividend | divisor) & 0xFF) < 0) { result = (char)dividend /
> > >> (char)divisor; //fast divide } else { result = dividend / divisor;
> > >> //slow divide }
> > >
> > > Yes I understand. You want a simple division to be expanded into
> > > code that
> > involves multiple BB and control flows.
> > >
> > >>
> > >> I still want the both the fast and slow divide to go through normal
> > >> isel. But
> > what you are suggesting significantly changes the way DIVs are handled
> > for Atom CPUs, specifically avoiding the UDIVREM/SDIVREM expansion!
> > Are you suggesting that I duplicate this code in a custom inserter function?
> > >
> > > No, you would custom lower sdiv / udiv / srem / urem nodes so
> > > UDIVREM /
> > SDIVREM code will never be formed.  The custom inserter function
> > create MachineInstr, not SDNodes.
> > >
> > > Evan
> > >
> > >>
> > >> Tyler
> > >>
> > >>> -----Original Message-----
> > >>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> > >>> Sent: Friday, August 03, 2012 6:07 PM
> > >>> To: Nowicki, Tyler
> > >>> Cc: llvm-commits at cs.uiuc.edu
> > >>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> > >>>
> > >>>
> > >>> On Aug 3, 2012, at 2:28 PM, Nowicki, Tyler wrote:
> > >>>
> > >>>> This optimization should not be classified as `instruction
> > >>>> selection'. We're not
> > >>> simply replacing one form of divide with another at compilation
> > >>> time. The optimization adds control flow to use an 8-bit DIV when
> > >>> the values of the operands are determined to be small enough
> > >>> during
> > program execution.
> > >>>
> > >>> It is instruction selection. This is about selecting the optimal
> > >>> code sequence to implement an operation.
> > >>>
> > >>>>
> > >>>> Several approaches were looked at for this optimization and the
> > >>>> IR pass was
> > >>> found to be the best way to accomplish it. Here are the approaches
> > >>> that we looked at and the reasons they are not suitable for this
> > optimization.
> > >>>>
> > >>>> 1. Using a custom lowering function Specifying a custom lowering
> > >>>> function for ISD::UDIV/UREM/etc instructions
> > >>> using setOperationAction(...) in X86ISelLowering. However, the
> > >>> lowering functions are limited to using a SelectionDAG. Because
> > >>> the optimization requires several new BBs to be added it is not
> > >>> possible to add it here. Is this the place you were thinking of?
> > >>>>
> > >>>> 2. Using a custom inserter
> > >>>> Marking appropriate instructions with usesCustomInserter in
> > >>>> tablegen and
> > >>> implementing the optimization by inserting several new
> > >>> MachineBasicBlocks. As said previously ISD::UDIVREM/SDIVREM are
> > >>> expanded into multiple instructions, see X86ISelDAGToDAG:2301.
> > >>> Adding it here would require the inserted instructions to be
> > >>> identified, removed, and a similar set of instructions to be added
> > >>> for the fast and slow DIV cases. However, this cannot be easily
> > >>> done because there are no guarantees how UDIVREM/SDIVREM will be
> > expanded
> > >>> or in what order, and inserting the fast and slow DIVs here would
> > essentially require duplicating the UDIVREM/SDIVREM expansion code.
> > >>>
> > >>> You want to use combination of 1 and 2. Custom lower into a target
> > >>> specific node which will map to a pseudo instruction marked with
> > usesCustomInserter.
> > >>> That allows you to create new BBs. There are lots of examples, e.g.
> > atomic ops.
> > >>>
> > >>> Evan
> > >>>
> > >>>>
> > >>>> 3. Other approaches
> > >>>> We want the UDIVREM/SDIVREM expansion to operate on the
> original
> > >>>> DIV
> > >>> instruction and the DIV this optimization inserts so it seems like
> > >>> this optimization could be implemented in
> > >>> X86DAGToDAGISel::PreprocessISelDAG. Can we add control flow at
> > >>> this
> > point?
> > >>>>
> > >>>> Tyler Nowicki
> > >>>> Intel
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> > >>>>> Sent: Wednesday, August 01, 2012 4:38 PM
> > >>>>> To: Nowicki, Tyler
> > >>>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom Optimization
> > >>>>>
> > >>>>>
> > >>>>> On Aug 1, 2012, at 11:58 AM, "Nowicki, Tyler"
> > >>>>> <tyler.nowicki at intel.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I don't understand what you are suggesting.
> > >>>>>>
> > >>>>>> ISD::SDIV, ISD::SREM and ISD::UDIV, ISD::UREM are combined to
> > >>>>>> form
> > >>>>> ISD::SDIVREM, and ISD::UDIVREM. These are handled by
> > >>>>> X86ISelDAGToDAG:2301 which takes care of all the special cases
> > >>>>> by inserting the appropriate movs, cdq/cwd, and X86::DIV or
> > >>>>> X86::IDIV instructions. A custom inserter is invoked after this,
> > >>>>> but it's too late because the div/rem operations have been
> > >>>>> replaced by numerous
> > >>> instructions.
> > >>>>>
> > >>>>> Looks like you don't understand how LLVM instruction works.
> > >>>>> Please spend some time looking at the passes in isel, especially
> > >>>>> how targets do custom lowering. Basically you want to lower
> > >>>>> these ISD nodes before they reach X86ISelDAGToDAG.
> > >>>>>
> > >>>>>>
> > >>>>>> What is wrong with doing this in a backend IR pass?
> > >>>>>
> > >>>>> Because this is a target specific instruction selection issue
> > >>>>> and it doesn't belong in LLVM IR level. We don't want arbitrary
> > >>>>> IR passes to do bits and pieces of optimization.
> > >>>>>
> > >>>>> Evan
> > >>>>>
> > >>>>>>
> > >>>>>> Tyler
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> > >>>>>>> Sent: Wednesday, August 01, 2012 2:18 PM
> > >>>>>>> To: Nowicki, Tyler
> > >>>>>>> Cc: llvm-commits at cs.uiuc.edu
> > >>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom
> > >>>>>>> Optimization
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Aug 1, 2012, at 9:18 AM, "Nowicki, Tyler"
> > >>>>>>> <tyler.nowicki at intel.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks for the reply.
> > >>>>>>>>
> > >>>>>>>> Lower is too late because DIV gets expanded into 2 MOV, a CDQ
> > >>>>>>>> or CWD, the DIV or IDIV, and one or two more MOVs during
> > >>>>>>>> instruction selection on DAG conversion in X86ISelDAGToDAG.
> > >>>>>>>> Adding
> > >>>>>>>
> > >>>>>>> Not during X86ISelDAGToDAG. You should do this using custom
> > >>>>>>> lowering in X86ISelLowering. At that time, it just looks like
> > >>>>>>> a ISD::SDIV /
> > >>> UDIV node.
> > >>>>>>>
> > >>>>>>>> this optimization after would require a custom inserter to
> > >>>>>>>> search and remove
> > >>>>>>> these instructions, and generate new BBs. This is complicated
> > >>>>>>> by
> > >>>>>>> O2 optimization which may remove MOVs when possible, by the
> > >>>>>>> scheduler on Atom which changes the order of instructions to
> > >>>>>>> avoid stalls, and by which form of the DIV instruction is used.
> > >>>>>>> Also, depending on the spills and neighboring instructions it
> > >>>>>>> is difficult to enumerate all the possible ways this
> > >>>>>>> optimization can be
> > carried out.
> > >>>>>>>
> > >>>>>>> The right thing to do is to custom lower into a X86ISD target
> > >>>>>>> node and select to a pseudo instruction. It can be marked with
> > >>>>>>> usesCustomInserter = 1 which allows you to expand it into the
> > >>>>>>> machine
> > >>>>> instructions and create new BBs if needed.
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> It makes sense to do the optimization as an IR pass because
> > >>>>>>>> none of this has
> > >>>>>>> happened yet. Then CSE and scheduling can operate on the new
> > BBs.
> > >>>>>>>
> > >>>>>>> Scheduling doesn't happen until later and machine CSE can
> > >>>>>>> handle these instructions. Please look at some examples to
> > >>>>>>> learn more about how existing targets handle complex cases.
> > >>>>>>>
> > >>>>>>> Evan
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>> Tyler
> > >>>>>>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>> From: Evan Cheng [mailto:evan.cheng at apple.com]
> > >>>>>>>>> Sent: Tuesday, July 31, 2012 6:41 PM
> > >>>>>>>>> To: Nowicki, Tyler
> > >>>>>>>>> Cc: Eli Friedman; llvm-commits at cs.uiuc.edu
> > >>>>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom
> > >>>>>>>>> Optimization
> > >>>>>>>>>
> > >>>>>>>>> Hi Tyler,
> > >>>>>>>>>
> > >>>>>>>>> This doesn't look like the right approach. I don't
> > >>>>>>>>> understand why you are adding a LLVM ir pass to do the
> > >>>>>>>>> expansion. Why can't this be done at x86 instruction lowering
> time?
> > >>>>>>>>>
> > >>>>>>>>> Evan
> > >>>>>>>>>
> > >>>>>>>>> On Jul 31, 2012, at 1:25 PM, "Nowicki, Tyler"
> > >>>>>>>>> <tyler.nowicki at intel.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Thanks for the review. Good catch on the missing check,
> > >>>>>>>>>> I've also made use
> > >>>>>>>>> of DenseMap as suggested. Test cases omitted from the
> > >>>>>>>>> previously patch are in place again and a test case for the
> > >>>>>>>>> missing check you found has been added.
> > >>>>>>>>>>
> > >>>>>>>>>> Please review my corrections.
> > >>>>>>>>>>
> > >>>>>>>>>> Tyler Nowicki
> > >>>>>>>>>> Intel
> > >>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: Eli Friedman [mailto:eli.friedman at gmail.com]
> > >>>>>>>>>>> Sent: Friday, July 27, 2012 2:17 PM
> > >>>>>>>>>>> To: Nowicki, Tyler
> > >>>>>>>>>>> Cc: Sean Silva; llvm-commits at cs.uiuc.edu
> > >>>>>>>>>>> Subject: Re: [llvm-commits] [PATCH]IDIV->DIVB Atom
> > >>>>>>>>>>> Optimization
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Jul 27, 2012 at 8:35 AM, Nowicki, Tyler
> > >>>>>>>>>>> <tyler.nowicki at intel.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>> This patch has not yet been reviewed. Could someone take
> > >>>>>>>>>>>> a look at
> > >>>>> it?
> > >>>>>>>>>>>
> > >>>>>>>>>>> +    typedef std::pair<Value *, Value *> DivBOperands;
> > >>>>>>>>>>> + //Dividend
> > >>>>>>>>>>> Value, Divisor Value
> > >>>>>>>>>>> +    typedef std::pair<PHINode *, PHINode *> DivBPhiNodes;
> > >>>>>>>>>>> + //Quotient
> > >>>>>>>>>>> PHI, Remainder Phi
> > >>>>>>>>>>> +    typedef std::map<DivBOperands, DivBPhiNodes>
> > >>>>>>>>>>> + DivBCacheTy;
> > >>>>>>>>>>>
> > >>>>>>>>>>> llvm::DenseMap?
> > >>>>>>>>>>>
> > >>>>>>>>>>> +  // Replace operation value with previously generated
> > >>>>>>>>>>> + phi node DivBPhiNodes Value = CacheI->second;  if
> (UseDivOp) {
> > >>>>>>>>>>> +    // Replace all uses of div instruction with quotient phi
> node
> > >>>>>>>>>>> +    J->replaceAllUsesWith(Value.first);
> > >>>>>>>>>>> +  } else {
> > >>>>>>>>>>> +    // Replace all uses of rem instruction with remainder
> > >>>>>>>>>>> + phi
> > node
> > >>>>>>>>>>> +    J->replaceAllUsesWith(Value.second);
> > >>>>>>>>>>> +  }
> > >>>>>>>>>>>
> > >>>>>>>>>>> Looks like there's a missing check for UseSignedOp here.
> > >>>>>>>>>>>
> > >>>>>>>>>>> The patch doesn't appear to include any testcases.
> > >>>>>>>>>>>
> > >>>>>>>>>>> -Eli
> > >>>>>>>>>> <idivtodivb-
> > >>>>>>>>>
> > >>> svn.patch>_______________________________________________
> > >>>>>>>>>> llvm-commits mailing list
> > >>>>>>>>>> llvm-commits at cs.uiuc.edu
> > >>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
A non-text attachment was scrubbed...
Name: BypassSlowDiv5-svn.patch
Type: application/octet-stream
Size: 21120 bytes
Desc: BypassSlowDiv5-svn.patch
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120813/16ae2620/attachment.obj>