[LLVMdev] Interesting post increment situation in DAG combiner

Fri Mar 1 15:40:32 PST 2013

----- Original Message -----
> From: "Sergei Larin" <slarin at codeaurora.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Friday, March 1, 2013 5:16:04 PM
> Subject: RE: Interesting post increment situation in DAG combiner
> 
> Hal,
> 
>   Here is my patch for the post inc case. I think it is symmetrically
>   applicable to the pre-inc, but I have not tested it for that.
> I think you can clearly see my intent here - I simply select the
> "latest" candidate when multiple are available.

I'll look at it, hopefully in the next few days.

> 
>    Who else might be interested in this?

ARM perhaps.

Thanks again,
Hal

> 
> Sergei
> 
> ---
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> 
> 
> > -----Original Message-----
> > From: Hal Finkel [mailto:hfinkel at anl.gov]
> > Sent: Friday, March 01, 2013 4:53 PM
> > To: Sergei Larin
> > Cc: llvmdev at cs.uiuc.edu
> > Subject: Re: Interesting post increment situation in DAG combiner
> > 
> > ----- Original Message -----
> > > From: "Sergei Larin" <slarin at codeaurora.org>
> > > To: "Hal Finkel" <hfinkel at anl.gov>
> > > Cc: llvmdev at cs.uiuc.edu
> > > Sent: Friday, March 1, 2013 10:24:39 AM
> > > Subject: Interesting post increment situation in DAG combiner
> > >
> > > Hal, (and everyone who might care about post increment
> > > generation)...
> > 
> > Sergei,
> > 
> > Perhaps this is a problem that I wish that I had ;) -- PPC does not
> > have post-increment loads and stores, only pre-increment ones.
> > Nevertheless, I think that the situation is similar...
> > 
> > For one thing, I recently committed an enhancement to pre-increment
> > generation that causes later constant offsets to use the new
> > incremented address instead of the old address. I thought that this
> > would not be an issue for post-increment generation, but it seems
> > that
> > I was wrong: you'd have the same problem here: if you
> > post-increment
> > the load and not the store, then you might need an extra register
> > to
> > hold the original base address for the store. In this case, you'd
> > not
> > have a problem if you just chose to post-increment the store
> > instead,
> > but the general problem still exists.
> > 
> > Regarding the selection of which load/store to pre/post increment,
> > I
> > think that this is also a general issue. At least for pre-increment
> > generation I've seen it make some odd choices.
> > 
> > In short, I do recognize the issue, and I'm curious to see your
> > patch.
> > If it can improve pre-increment selection as well, that would help
> > me
> > too.
> > 
> > Thanks again,
> > Hal
> > 
> > >
> > > I have an interesting question/observation. Consider this vector
> > loop.
> > >
> > > void vec_add_const(unsigned N, short __attribute__ ((aligned
> > > (16)))
> > > *A,
> > >                                   short __attribute__ ((aligned
> > >                                   (16))) val) {  unsigned i,j;
> > >                                    for
> > > (i=0; i<N; i++) {
> > >   for (j=0; j<N; j++) {
> > >    A[i*N+j] += val;
> > >   }
> > >  }
> > > }
> > >
> > > The innermost loop looks like this right before the DAG selection
> > > begins.
> > >
> > >   p.loop_body.us65:                             ; preds =
> > > %p.loop_body.lr.ph.us78, %p.loop_body.us65
> > >   %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
> > > %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc,
> > > %p.loop_body.us65
> > ]
> > >   %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
> > > %p.next_loopiv.us67, %p.loop_body.us65 ]
> > >   %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x
> > >   i16>*
> > >   %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4
> > >    <<<<<<<<<<<<<<<<<<
> > > IV
> > >   %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
> > > <<<<<<<<<<<<<<<<<<<Load
> > >   %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
> > >   store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70,
> > >   align
> > >   16
> > > <<<<<<<<<<<<<<<Store
> > >   %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi,
> > >   i32
> > >   4
> > > <<<<<<<<<<<<<<< Common Ptr
> > >   %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
> > >   br i1 %11, label %p.loop_body.us65, label
> > >   %p.loop_header38.preheader.us84
> > >
> > > When it gets to the DAG Combiner, in
> > > CombineToPostIndexedLoadStore()
> > > two opportunities for post inc are recognized - the load and the
> > > store.
> > > Now, you can easily see that in this case you would want the
> > > store to
> > > get the post inc, not the load, but since the DAG combiner simply
> > > scans top-down, the opposite happens.
> > >
> > >   So here is the question - do you recognize this as a
> > >   deficiency,
> > >   and can
> > > you see the same in PPC? The fix is code trivial, but it would
> > > introduce a general concept of a primitive cost function to the
> > > PostInc candidacy selection in DAG combine. If you recognize the
> > > issue, I will post a patch with more details, but if I am missing
> > > the
> > > big picture here, please advise.
> > >
> > > Sergei
> > >
> > >
> > > ---
> > > Qualcomm Innovation Center, Inc. is a member of Code Aurora
> > > Forum,
> > > hosted by The Linux Foundation
> > >
> > >
> > >
> > >
>