<div dir="ltr">Which cases does this pass handle which aren't otherwise optimized out by passes like GlobalValueNumbering or DeadCodeElimination?<div><br></div><div>Thanks,</div><div>Jake VanAdrighem</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 10, 2015 at 2:35 PM, Steve King <span dir="ltr"><<a href="mailto:steve@metrokings.com" target="_blank">steve@metrokings.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello LLVM,<br>

It seems this thread has gone cold.  Is there some low risk way for<br>

the community to take the new pass for a test drive?<br>

Regards,<br>

-steve<br>

<div class="HOEnZb"><div class="h5"><br>

On Wed, Sep 2, 2015 at 8:27 PM, Steve King <<a href="mailto:steve@metrokings.com">steve@metrokings.com</a>> wrote:<br>

> On Wed, Sep 2, 2015 at 5:36 AM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk">james@jamesmolloy.co.uk</a>> wrote:<br>

>> Hi,<br>

>><br>

>> Coremark really isn't a good enough test - have you run the LLVM test suite<br>

>> with this patch, and what were the performance differences?<br>

><br>

> For the test suite single source benches, the 235 tests improved<br>

> performance, 2 regressed and 705 were unchanged.  That seems very<br>

> optimistic. Comparing consecutive runs with identical setting shows<br>

> there is a lot of noise in the performance data.  Tips for stable<br>

> results would be appreciated.<br>

><br>

><br>

>> I'm still a bit confused about what pattern exactly this pass is supposed to<br>

>> trigger on. I understand the mechanics, but I still can't quite see what<br>

>> patterns it would be useful on. You've mentioned matrix multiply - how does<br>

>> this pass alter the IR?<br>

><br>

> Here's before and after IR for the matrix_mul example.  Notice the two<br>

> bitcasts %1 and %2 generated in the for.cond.cleanup block.  The L.E.V<br>

> pass converts these to scevgep values that already exist.<br>

><br>

> *** Code after LSR ***<br>

><br>

> ; Function Attrs: nounwind optsize<br>

> define void @matrix_mul(i32 %Size, i32* nocapture %Dst, i32* nocapture<br>

> readonly %Src, i32 %Val) #0 {<br>

> entry:<br>

>   %cmp.25 = icmp eq i32 %Size, 0<br>

>   br i1 %cmp.25, label %for.cond.cleanup, label %for.body.4.lr.ph.preheader<br>

><br>

> for.body.4.lr.ph.preheader:                       ; preds = %entry<br>

>   %0 = shl i32 %Size, 2<br>

>   br label %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a><br>

><br>

> <a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a>:                                 ; preds =<br>

> %for.body.4.lr.ph.preheader, %for.cond.cleanup.3<br>

>   %lsr.iv5 = phi i32* [ %Src, %for.body.4.lr.ph.preheader ], [ %2,<br>

> %for.cond.cleanup.3 ]<br>

>   %lsr.iv1 = phi i32* [ %Dst, %for.body.4.lr.ph.preheader ], [ %1,<br>

> %for.cond.cleanup.3 ]<br>

>   %Outer.026 = phi i32 [ %inc10, %for.cond.cleanup.3 ], [ 0,<br>

> %for.body.4.lr.ph.preheader ]<br>

>   %lsr.iv56 = bitcast i32* %lsr.iv5 to i1*<br>

>   %lsr.iv12 = bitcast i32* %lsr.iv1 to i1*<br>

>   br label %for.body.4<br>

><br>

> for.cond.cleanup.loopexit:                        ; preds = %for.cond.cleanup.3<br>

>   br label %for.cond.cleanup<br>

><br>

> for.cond.cleanup:                                 ; preds =<br>

> %for.cond.cleanup.loopexit, %entry<br>

>   ret void<br>

><br>

> for.cond.cleanup.3:                               ; preds = %for.body.4<br>

>   %inc10 = add nuw nsw i32 %Outer.026, 1<br>

>   %scevgep = getelementptr i1, i1* %lsr.iv12, i32 %0<br>

>   %1 = bitcast i1* %scevgep to i32*<br>

>   %scevgep7 = getelementptr i1, i1* %lsr.iv56, i32 %0<br>

>   %2 = bitcast i1* %scevgep7 to i32*<br>

>   %exitcond27 = icmp eq i32 %inc10, %Size<br>

>   br i1 %exitcond27, label %for.cond.cleanup.loopexit, label %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a><br>

><br>

> for.body.4:                                       ; preds =<br>

> %for.body.4, %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a><br>

>   %lsr.iv8 = phi i32* [ %scevgep9, %for.body.4 ], [ %lsr.iv5,<br>

> %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a> ]<br>

>   %lsr.iv3 = phi i32* [ %scevgep4, %for.body.4 ], [ %lsr.iv1,<br>

> %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a> ]<br>

>   %lsr.iv = phi i32 [ %lsr.iv.next, %for.body.4 ], [ %Size, %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a> ]<br>

>   %3 = load i32, i32* %lsr.iv8, align 4, !tbaa !1<br>

>   %mul5 = mul i32 %3, %Val<br>

>   store i32 %mul5, i32* %lsr.iv3, align 4, !tbaa !1<br>

>   %lsr.iv.next = add i32 %lsr.iv, -1<br>

>   %scevgep4 = getelementptr i32, i32* %lsr.iv3, i32 1<br>

>   %scevgep9 = getelementptr i32, i32* %lsr.iv8, i32 1<br>

>   %exitcond = icmp eq i32 %lsr.iv.next, 0<br>

>   br i1 %exitcond, label %for.cond.cleanup.3, label %for.body.4<br>

> }<br>

><br>

><br>

> *** Code after Loop Exit Values Optimization **<br>

><br>

> ; Function Attrs: nounwind optsize<br>

> define void @matrix_mul(i32 %Size, i32* nocapture %Dst, i32* nocapture<br>

> readonly %Src, i32 %Val) #0 {<br>

> entry:<br>

>   %cmp.25 = icmp eq i32 %Size, 0<br>

>   br i1 %cmp.25, label %for.cond.cleanup, label %for.body.4.lr.ph.preheader<br>

><br>

> for.body.4.lr.ph.preheader:                       ; preds = %entry<br>

>   br label %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a><br>

><br>

> <a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a>:                                 ; preds =<br>

> %for.body.4.lr.ph.preheader, %for.cond.cleanup.3<br>

>   %lsr.iv5 = phi i32* [ %Src, %for.body.4.lr.ph.preheader ], [<br>

> %scevgep9, %for.cond.cleanup.3 ]<br>

>   %lsr.iv1 = phi i32* [ %Dst, %for.body.4.lr.ph.preheader ], [<br>

> %scevgep4, %for.cond.cleanup.3 ]<br>

>   %Outer.026 = phi i32 [ %inc10, %for.cond.cleanup.3 ], [ 0,<br>

> %for.body.4.lr.ph.preheader ]<br>

>   br label %for.body.4<br>

><br>

> for.cond.cleanup.loopexit:                        ; preds = %for.cond.cleanup.3<br>

>   br label %for.cond.cleanup<br>

><br>

> for.cond.cleanup:                                 ; preds =<br>

> %for.cond.cleanup.loopexit, %entry<br>

>   ret void<br>

><br>

> for.cond.cleanup.3:                               ; preds = %for.body.4<br>

>   %inc10 = add nuw nsw i32 %Outer.026, 1<br>

>   %exitcond27 = icmp eq i32 %inc10, %Size<br>

>   br i1 %exitcond27, label %for.cond.cleanup.loopexit, label %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a><br>

><br>

> for.body.4:                                       ; preds =<br>

> %for.body.4, %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a><br>

>   %lsr.iv8 = phi i32* [ %scevgep9, %for.body.4 ], [ %lsr.iv5,<br>

> %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a> ]<br>

>   %lsr.iv3 = phi i32* [ %scevgep4, %for.body.4 ], [ %lsr.iv1,<br>

> %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a> ]<br>

>   %lsr.iv = phi i32 [ %lsr.iv.next, %for.body.4 ], [ %Size, %<a href="http://for.body.4.lr.ph" rel="noreferrer" target="_blank">for.body.4.lr.ph</a> ]<br>

>   %0 = load i32, i32* %lsr.iv8, align 4, !tbaa !1<br>

>   %mul5 = mul i32 %0, %Val<br>

>   store i32 %mul5, i32* %lsr.iv3, align 4, !tbaa !1<br>

>   %lsr.iv.next = add i32 %lsr.iv, -1<br>

>   %scevgep4 = getelementptr i32, i32* %lsr.iv3, i32 1<br>

>   %scevgep9 = getelementptr i32, i32* %lsr.iv8, i32 1<br>

>   %exitcond = icmp eq i32 %lsr.iv.next, 0<br>

>   br i1 %exitcond, label %for.cond.cleanup.3, label %for.body.4<br>

><br>

><br>

>> What value is it avoiding being recomputed?<br>

> I'm not precisely sure, but it's residue from LSR.  The pass checks<br>

> all computable SCEV values when a loop exits and in this case found<br>

> GEPs with the same value.<br>

><br>

>> How does this pass affect register pressure?<br>

>> Also, your example just removes a mov and an add - the push/pops are just<br>

>> register allocation (unless your pass in fact *reduces* register pressure?)<br>

><br>

> Right, the computation eliminated is the mov and add.  Register<br>

> savings is a byproduct.<br>

><br>

> Regards,<br>

> -steve<br>

</div></div></blockquote></div><br></div>