[LLVMdev] Interesting post increment situation in DAG combiner

Fri Mar 1 14:52:50 PST 2013

----- Original Message -----
> From: "Sergei Larin" <slarin at codeaurora.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Friday, March 1, 2013 10:24:39 AM
> Subject: Interesting post increment situation in DAG combiner
> 
> Hal, (and everyone who might care about post increment generation)...

Sergei,

Perhaps this is a problem that I wish that I had ;) -- PPC does not have post-increment loads and stores, only pre-increment ones. Nevertheless, I think that the situation is similar...

For one thing, I recently committed an enhancement to pre-increment generation that causes later constant offsets to use the new incremented address instead of the old address. I thought that this would not be an issue for post-increment generation, but it seems that I was wrong: you'd have the same problem here: if you post-increment the load and not the store, then you might need an extra register to hold the original base address for the store. In this case, you'd not have a problem if you just chose to post-increment the store instead, but the general problem still exists.

Regarding the selection of which load/store to pre/post increment, I think that this is also a general issue. At least for pre-increment generation I've seen it make some odd choices.

In short, I do recognize the issue, and I'm curious to see your patch. If it can improve pre-increment selection as well, that would help me too.

Thanks again,
Hal

> 
> I have an interesting question/observation. Consider this vector
> loop.
> 
> void vec_add_const(unsigned N, short __attribute__ ((aligned (16)))
> *A,
>                                   short __attribute__ ((aligned
>                                   (16))) val)
> {
>  unsigned i,j;
>  for (i=0; i<N; i++) {
>   for (j=0; j<N; j++) {
>    A[i*N+j] += val;
>   }
>  }
> }
> 
> The innermost loop looks like this right before the DAG selection
> begins.
> 
>   p.loop_body.us65:                             ; preds =
> %p.loop_body.lr.ph.us78, %p.loop_body.us65
>   %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
> %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65
> ]
>   %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
> %p.next_loopiv.us67, %p.loop_body.us65 ]
>   %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
>   %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4
>    <<<<<<<<<<<<<<<<<<
> IV
>   %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
> <<<<<<<<<<<<<<<<<<<Load
>   %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
>   store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70, align
>   16
> <<<<<<<<<<<<<<<Store
>   %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32
>   4
> <<<<<<<<<<<<<<< Common Ptr
>   %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
>   br i1 %11, label %p.loop_body.us65, label
>   %p.loop_header38.preheader.us84
> 
> When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore()
> two
> opportunities for post inc are recognized - the load and the store.
> Now, you can easily see that in this case you would want the store to
> get
> the post inc, not the load, but since the DAG combiner simply scans
> top-down, the opposite happens.
> 
>   So here is the question - do you recognize this as a deficiency,
>   and can
> you see the same in PPC? The fix is code trivial, but it would
> introduce a
> general concept of a primitive cost function to the PostInc candidacy
> selection in DAG combine. If you recognize the issue, I will post a
> patch
> with more details, but if I am missing the big picture here, please
> advise.
> 
> Sergei
> 
> 
> ---
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by
> The Linux Foundation
> 
> 
> 
>