[llvm-dev] "trunc"s generated by LSR cause problem for SCEV

Ehsan Amiri via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 22 08:57:49 PDT 2016


Hi

I am working on a bug that is caused by Scalar Evolution not being able to
compute the iteration count of an unrolled loop (PR 28363). While I believe
there is enough information for SCEV to do its job, I think the code that
is generated by earlier transformations can be simpler. There is one bug in
IndVarSimplify for which Sanjoy Das suggested a fix. With that fix if I
disable loop strength reduction the problem is fixed. Below I have copied
the code before and after loop strength reduction.

For this code pattern, it is possible to prove that truncs generated by LSR
can be avoided (see bottom of the email). Andy Trick says that LSR
generally thinks that trunc is free, but there might be ways to work around
it or improve LSR target hooks.

1- Does anyone has any suggestion on how to fix this in LSR?
2- Any reason that we should not fix LSR, and instead focus on Scalar
Evolution so it can handle more complicated code patterns properly?


*Before LSR:*

*for.body.preheader*:
%xtraiter = and i32 %m, 7

*for.body.preheader.new:*
%unroll_iter = sub i32 %m, %xtraiter

*for.body:*
%niter = phi i32 [ %unroll_iter, %for.body.preheader.new ], [
%niter.nsub.7, %for.body ]
%indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.7,
%for.body ]
%indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
%niter.nsub.7 = add nsw i32 %niter, -8
%niter.ncmp.7 = icmp eq i32 %niter.nsub.7, 0


*After LSR:*

*for.body.preheader:*
%xtraiter = and i32 %m, 7

*for.body.preheader.new: *
%unroll_iter = sub i32 %m, %xtraiter
%2 = zext i32 %unroll_iter to i64

*for.body:*
%indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.7,
%for.body ]
%indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
%tmp = trunc i64 %indvars.iv.next.7 to i32
%tmp80 = trunc i64 %2 to i32
%niter.ncmp.7 = icmp eq i32 %tmp80, %tmp

*Why trunc is not needed:* %indvars.iv starts from 0 and increments by 8.
%2 is divsible by 8.  If indvars.iv.next.7 ever reaches a value, which has
a non-zero bit in its upper 32 bits, it will repeat that pattern until it
overflows. But the definition of %indvars.iv.next.7 is marked nsw.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/f2cdbdb4/attachment.html>


More information about the llvm-dev mailing list