[llvm] r225034 - InstCombine: try to transform A-B < 0 into A < B
Kevin Qin
kevinqindev at gmail.com
Wed Jan 14 17:53:15 PST 2015
Hi Michael and David,
I also see some performance regressions on aarch64 target introduced by
this commit. No further invesitgation on what's the root cause, but I guess
the one Michael shared may cause regression on many targets, including
aarch64.
I argee with Michael's suggestion that revert this commit firstly, and make
more performance test over it.
Thanks,
Kevin
2015-01-14 21:25 GMT+08:00 Kuperstein, Michael M <
michael.m.kuperstein at intel.com>:
> Hi David,
>
> It turns out this is causing quite a lot of performance regressions.
>
> One of the issues is that this messes up IndVar simplification.
>
> For this code:
> int foo (int *p, int M, int N, int K)
> {
> int ret = 0;
> for (int i = 0; i < M-N; i++)
> ret += p[i * K];
>
> return ret;
> }
>
> What we get coming into IndVar Simplify before the patch is:
> define i32 @foo(i32* nocapture readonly %p, i32 %M, i32 %N, i32 %K) #0 {
> entry:
> %sub = sub nsw i32 %M, %N
> %cmp4 = icmp sgt i32 %sub, 0
> br i1 %cmp4, label %for.body.lr.ph, label %for.end
>
> for.body.lr.ph: ; preds = %entry
> br label %for.body
>
> for.body: ; preds = %
> for.body.lr.ph, %for.body
> %i.06 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
> %ret.05 = phi i32 [ 0, %for.body.lr.ph ], [ %add, %for.body ]
> %mul = mul nsw i32 %i.06, %K
> %arrayidx = getelementptr inbounds i32* %p, i32 %mul
> %0 = load i32* %arrayidx, align 4, !tbaa !1
> %add = add nsw i32 %0, %ret.05
> %inc = add nsw i32 %i.06, 1
> %cmp = icmp slt i32 %inc, %sub
> br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
>
> for.cond.for.end_crit_edge: ; preds = %for.body
> %add.lcssa = phi i32 [ %add, %for.body ]
> br label %for.end
>
> for.end: ; preds =
> %for.cond.for.end_crit_edge, %entry
> %ret.0.lcssa = phi i32 [ %add.lcssa, %for.cond.for.end_crit_edge ], [ 0,
> %entry ]
> ret i32 %ret.0.lcssa
> }
>
> IndVar Simplify manages to compute the SCEV for the induction variable:
> INDVARS: Rewriting loop exit condition to:
> LHS: %i.06 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
> op: !=
> RHS: %1 = sub i32 %0, %N
> IVCount: (-1 + (-1 * %N) + %M)
>
> The condition on the backedge gets replaced with an "icmp ne", and the
> final assembly for the loop body is:
> .LBB0_2:
> addl (%esi), %eax
> addl %edx, %esi
> decl %ecx
> jne .LBB0_2
>
> With r225034, IndVar Simplify doesn't fire, we drag the "icmp slt" until
> the end, and end up with:
> .LBB0_2:
> addl (%esi), %eax
> incl %edi
> addl %edx, %esi
> cmpl %ecx, %edi
> jl .LBB0_2
>
> Note that this is not the only issue we're seeing, some of the regressions
> don't appear to be IndVar-related.
> It may be a good idea to revert r225034 and do more extensive performance
> testing before re-committing.
>
> Thanks,
> Michael
>
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:
> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of David Majnemer
> Sent: Wednesday, December 31, 2014 06:22
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm] r225034 - InstCombine: try to transform A-B < 0 into A < B
>
> Author: majnemer
> Date: Tue Dec 30 22:21:41 2014
> New Revision: 225034
>
> URL: http://llvm.org/viewvc/llvm-project?rev=225034&view=rev
> Log:
> InstCombine: try to transform A-B < 0 into A < B
>
> We are allowed to move the 'B' to the right hand side if we an prove there
> is no signed overflow and if the comparison itself is signed.
>
> Modified:
> llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp
> llvm/trunk/test/Transforms/InstCombine/icmp.ll
>
> Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp?rev=225034&r1=225033&r2=225034&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp
> (original)
> +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp Tue
> +++ Dec 30 22:21:41 2014
> @@ -2692,6 +2692,26 @@ Instruction *InstCombiner::visitICmpInst
> return new ICmpInst(I.getPredicate(), A, B);
> }
>
> + // (icmp sgt (sub nsw A B), -1) -> (icmp sge A, B)
> + if (I.getPredicate() == ICmpInst::ICMP_SGT && CI->isAllOnesValue() &&
> + match(Op0, m_NSWSub(m_Value(A), m_Value(B))))
> + return new ICmpInst(ICmpInst::ICMP_SGE, A, B);
> +
> + // (icmp sgt (sub nsw A B), 0) -> (icmp sgt A, B)
> + if (I.getPredicate() == ICmpInst::ICMP_SGT && CI->isZero() &&
> + match(Op0, m_NSWSub(m_Value(A), m_Value(B))))
> + return new ICmpInst(ICmpInst::ICMP_SGT, A, B);
> +
> + // (icmp slt (sub nsw A B), 0) -> (icmp slt A, B)
> + if (I.getPredicate() == ICmpInst::ICMP_SLT && CI->isZero() &&
> + match(Op0, m_NSWSub(m_Value(A), m_Value(B))))
> + return new ICmpInst(ICmpInst::ICMP_SLT, A, B);
> +
> + // (icmp slt (sub nsw A B), 1) -> (icmp sle A, B)
> + if (I.getPredicate() == ICmpInst::ICMP_SLT && CI->isOne() &&
> + match(Op0, m_NSWSub(m_Value(A), m_Value(B))))
> + return new ICmpInst(ICmpInst::ICMP_SLE, A, B);
> +
> // If we have an icmp le or icmp ge instruction, turn it into the
> // appropriate icmp lt or icmp gt instruction. This allows us to
> rely on
> // them being folded in the code below. The SimplifyICmpInst code has
>
> Modified: llvm/trunk/test/Transforms/InstCombine/icmp.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/icmp.ll?rev=225034&r1=225033&r2=225034&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/InstCombine/icmp.ll (original)
> +++ llvm/trunk/test/Transforms/InstCombine/icmp.ll Tue Dec 30 22:21:41
> +++ 2014
> @@ -1522,3 +1522,39 @@ define zeroext i1 @icmp_cmpxchg_strong(i
> %icmp = icmp eq i32 %xtrc, %old_val
> ret i1 %icmp
> }
> +
> +; CHECK-LABEL: @f1
> +; CHECK-NEXT: %[[cmp:.*]] = icmp sge i64 %a, %b ; CHECK-NEXT: ret i1
> +%[[cmp]] define i1 @f1(i64 %a, i64 %b) {
> + %t = sub nsw i64 %a, %b
> + %v = icmp sge i64 %t, 0
> + ret i1 %v
> +}
> +
> +; CHECK-LABEL: @f2
> +; CHECK-NEXT: %[[cmp:.*]] = icmp sgt i64 %a, %b ; CHECK-NEXT: ret i1
> +%[[cmp]] define i1 @f2(i64 %a, i64 %b) {
> + %t = sub nsw i64 %a, %b
> + %v = icmp sgt i64 %t, 0
> + ret i1 %v
> +}
> +
> +; CHECK-LABEL: @f3
> +; CHECK-NEXT: %[[cmp:.*]] = icmp slt i64 %a, %b ; CHECK-NEXT: ret i1
> +%[[cmp]] define i1 @f3(i64 %a, i64 %b) {
> + %t = sub nsw i64 %a, %b
> + %v = icmp slt i64 %t, 0
> + ret i1 %v
> +}
> +
> +; CHECK-LABEL: @f4
> +; CHECK-NEXT: %[[cmp:.*]] = icmp sle i64 %a, %b ; CHECK-NEXT: ret i1
> +%[[cmp]] define i1 @f4(i64 %a, i64 %b) {
> + %t = sub nsw i64 %a, %b
> + %v = icmp sle i64 %t, 0
> + ret i1 %v
> +}
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
--
Best Regards,
Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150115/32086dc3/attachment.html>
More information about the llvm-commits
mailing list