[LLVMdev] Interesting post increment situation in DAG combiner

Sergei Larin slarin at codeaurora.org
Fri Mar 1 15:16:04 PST 2013


Hal,

  Here is my patch for the post inc case. I think it is symmetrically applicable to the pre-inc, but I have not tested it for that.
I think you can clearly see my intent here - I simply select the "latest" candidate when multiple are available.

   Who else might be interested in this?

Sergei

---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation


> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Friday, March 01, 2013 4:53 PM
> To: Sergei Larin
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: Interesting post increment situation in DAG combiner
> 
> ----- Original Message -----
> > From: "Sergei Larin" <slarin at codeaurora.org>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: llvmdev at cs.uiuc.edu
> > Sent: Friday, March 1, 2013 10:24:39 AM
> > Subject: Interesting post increment situation in DAG combiner
> >
> > Hal, (and everyone who might care about post increment generation)...
> 
> Sergei,
> 
> Perhaps this is a problem that I wish that I had ;) -- PPC does not
> have post-increment loads and stores, only pre-increment ones.
> Nevertheless, I think that the situation is similar...
> 
> For one thing, I recently committed an enhancement to pre-increment
> generation that causes later constant offsets to use the new
> incremented address instead of the old address. I thought that this
> would not be an issue for post-increment generation, but it seems that
> I was wrong: you'd have the same problem here: if you post-increment
> the load and not the store, then you might need an extra register to
> hold the original base address for the store. In this case, you'd not
> have a problem if you just chose to post-increment the store instead,
> but the general problem still exists.
> 
> Regarding the selection of which load/store to pre/post increment, I
> think that this is also a general issue. At least for pre-increment
> generation I've seen it make some odd choices.
> 
> In short, I do recognize the issue, and I'm curious to see your patch.
> If it can improve pre-increment selection as well, that would help me
> too.
> 
> Thanks again,
> Hal
> 
> >
> > I have an interesting question/observation. Consider this vector
> loop.
> >
> > void vec_add_const(unsigned N, short __attribute__ ((aligned (16)))
> > *A,
> >                                   short __attribute__ ((aligned
> >                                   (16))) val) {  unsigned i,j;  for
> > (i=0; i<N; i++) {
> >   for (j=0; j<N; j++) {
> >    A[i*N+j] += val;
> >   }
> >  }
> > }
> >
> > The innermost loop looks like this right before the DAG selection
> > begins.
> >
> >   p.loop_body.us65:                             ; preds =
> > %p.loop_body.lr.ph.us78, %p.loop_body.us65
> >   %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
> > %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65
> ]
> >   %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
> > %p.next_loopiv.us67, %p.loop_body.us65 ]
> >   %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
> >   %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4
> >    <<<<<<<<<<<<<<<<<<
> > IV
> >   %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
> > <<<<<<<<<<<<<<<<<<<Load
> >   %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
> >   store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70, align
> >   16
> > <<<<<<<<<<<<<<<Store
> >   %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32
> >   4
> > <<<<<<<<<<<<<<< Common Ptr
> >   %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
> >   br i1 %11, label %p.loop_body.us65, label
> >   %p.loop_header38.preheader.us84
> >
> > When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore()
> > two opportunities for post inc are recognized - the load and the
> > store.
> > Now, you can easily see that in this case you would want the store to
> > get the post inc, not the load, but since the DAG combiner simply
> > scans top-down, the opposite happens.
> >
> >   So here is the question - do you recognize this as a deficiency,
> >   and can
> > you see the same in PPC? The fix is code trivial, but it would
> > introduce a general concept of a primitive cost function to the
> > PostInc candidacy selection in DAG combine. If you recognize the
> > issue, I will post a patch with more details, but if I am missing the
> > big picture here, please advise.
> >
> > Sergei
> >
> >
> > ---
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > hosted by The Linux Foundation
> >
> >
> >
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DagCombiner_PostInc.patch
Type: application/octet-stream
Size: 3444 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130301/97fc5162/attachment.obj>


More information about the llvm-dev mailing list