[LLVMdev] Interesting post increment situation in DAG combiner
Hal Finkel
hfinkel at anl.gov
Fri Mar 1 14:52:50 PST 2013
----- Original Message -----
> From: "Sergei Larin" <slarin at codeaurora.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Friday, March 1, 2013 10:24:39 AM
> Subject: Interesting post increment situation in DAG combiner
>
> Hal, (and everyone who might care about post increment generation)...
Sergei,
Perhaps this is a problem that I wish that I had ;) -- PPC does not have post-increment loads and stores, only pre-increment ones. Nevertheless, I think that the situation is similar...
For one thing, I recently committed an enhancement to pre-increment generation that causes later constant offsets to use the new incremented address instead of the old address. I thought that this would not be an issue for post-increment generation, but it seems that I was wrong: you'd have the same problem here: if you post-increment the load and not the store, then you might need an extra register to hold the original base address for the store. In this case, you'd not have a problem if you just chose to post-increment the store instead, but the general problem still exists.
Regarding the selection of which load/store to pre/post increment, I think that this is also a general issue. At least for pre-increment generation I've seen it make some odd choices.
In short, I do recognize the issue, and I'm curious to see your patch. If it can improve pre-increment selection as well, that would help me too.
Thanks again,
Hal
>
> I have an interesting question/observation. Consider this vector
> loop.
>
> void vec_add_const(unsigned N, short __attribute__ ((aligned (16)))
> *A,
> short __attribute__ ((aligned
> (16))) val)
> {
> unsigned i,j;
> for (i=0; i<N; i++) {
> for (j=0; j<N; j++) {
> A[i*N+j] += val;
> }
> }
> }
>
> The innermost loop looks like this right before the DAG selection
> begins.
>
> p.loop_body.us65: ; preds =
> %p.loop_body.lr.ph.us78, %p.loop_body.us65
> %p_arrayidx.us69.phi = phi i16* [ %p_arrayidx.us69.gep,
> %p.loop_body.lr.ph.us78 ], [ %p_arrayidx.us69.inc, %p.loop_body.us65
> ]
> %p.loopiv48.us66 = phi i32 [ 0, %p.loop_body.lr.ph.us78 ], [
> %p.next_loopiv.us67, %p.loop_body.us65 ]
> %vector_ptr.us70 = bitcast i16* %p_arrayidx.us69.phi to <4 x i16>*
> %p.next_loopiv.us67 = add nsw i32 %p.loopiv48.us66, 4
> <<<<<<<<<<<<<<<<<<
> IV
> %_p_vec_full.us71 = load <4 x i16>* %vector_ptr.us70, align 16
> <<<<<<<<<<<<<<<<<<<Load
> %add5p_vec.us72 = add <4 x i16> %_p_vec_full.us71, %5
> store <4 x i16> %add5p_vec.us72, <4 x i16>* %vector_ptr.us70, align
> 16
> <<<<<<<<<<<<<<<Store
> %p_arrayidx.us69.inc = getelementptr i16* %p_arrayidx.us69.phi, i32
> 4
> <<<<<<<<<<<<<<< Common Ptr
> %11 = icmp slt i32 %p.next_loopiv.us67, %leftover_lb
> br i1 %11, label %p.loop_body.us65, label
> %p.loop_header38.preheader.us84
>
> When it gets to the DAG Combiner, in CombineToPostIndexedLoadStore()
> two
> opportunities for post inc are recognized - the load and the store.
> Now, you can easily see that in this case you would want the store to
> get
> the post inc, not the load, but since the DAG combiner simply scans
> top-down, the opposite happens.
>
> So here is the question - do you recognize this as a deficiency,
> and can
> you see the same in PPC? The fix is code trivial, but it would
> introduce a
> general concept of a primitive cost function to the PostInc candidacy
> selection in DAG combine. If you recognize the issue, I will post a
> patch
> with more details, but if I am missing the big picture here, please
> advise.
>
> Sergei
>
>
> ---
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by
> The Linux Foundation
>
>
>
>
More information about the llvm-dev
mailing list