[LLVMdev] loop strength reduction and zext/sext formulae

Mon Apr 13 23:31:03 PDT 2015

I have some IR that looks like this right before llc -O3 invokes LSR:

loop_header:                                      ; preds =
%loop_header.preheader1, %in_loop_2
  %var_13_ = phi double [ %183, %in_loop_2 ], [ 0.000000e+00,
%loop_header.preheader1 ]
  %var_17_ = phi i32 [ %184, %in_loop_2 ], [ %153, %loop_header.preheader1 ]
  %171 = zext i32 %var_17_ to i64
  %172 = icmp ult i32 %var_17_, %length.i820
  br i1 %172, label %in_loop, label %exit

in_loop:
  ....
  br %should_stay_in_loop, label %in_loop_2, label %exit

in_loop_2:
  %180 = getelementptr inbounds double, double addrspace(1)* %26, i64 %171
  %181 = load double, double addrspace(1)* %180, align 8
  %182 = fmul double %178, %181
  %183 = fadd double %var_13_, %182
  %184 = add nsw i32 %var_17_, 1
  %185 = icmp slt i32 %184, %157
  br i1 %185, label %loop_header, label %outside.loopexit

If SCEV is unable to prove that %var_17_ is an nuw SCEV then all is
well, and the code generated for the BB %in_loop_2 looks like this:

mulsd 16(%rbx,%rcx,8), %xmm1
addsd %xmm1, %xmm0
leal 1(%rcx), %ebx
cmpl %r11d, %ebx
jl .LBB0_69

which is great and pretty much what I'd expect.

However, if SCEV can prove that %var_17_ is NUW (which it can via
%172, after http://reviews.llvm.org/rL233829) LSR optimizes %in_loop_2
to

in_loop_2:
  %scevgep12 = getelementptr double, double addrspace(1)*
%scevgep1011, i64 %lsr.iv5
  %177 = load double, double addrspace(1)* %scevgep12, align 8, !tbaa !24
  %178 = fmul double %175, %177
  %179 = fadd double %var_13_, %178
  %lsr.iv.next6 = add nuw nsw i64 %lsr.iv5, 1
  %180 = add i64 %153, %lsr.iv.next6
  %tmp = trunc i64 %180 to i32
  %181 = icmp slt i32 %tmp, %151
  br i1 %181, label %not_zero146, label %bci_81.loopexit

where

  %lsr.iv5 = phi i64 [ %lsr.iv.next6, %in_bounds161 ], [ 0,
%not_zero146.preheader1 ]

This is a regression -- the IR itself is more complicated and
generated machine code for %in_loop_2 is

mulsd (%r13,%rdi,8), %xmm1
addsd %xmm1, %xmm0
incq %rdi
movl %edi, %ecx
addl %r14d, %ecx
movq %r8, %rbx
cmpl %ebx, %ecx
jl .LBB0_69

which has two more instructions per iteration of the loop (one of
which is an add) that the first assembly listing.

As far as I can tell, the key issue here is that LSR does not consider
formulae of the form "(zext T)" for a use -- there is no
LSRInstance::GenerateZexts (or LSRInstance::GenerateSexts, for that
matter).  Adding a LSRInstance::GenerateZexts modeled after
LSRInstance::GenerateTruncates seems to fix the issue.  Does this
makes sense or am I missing something?

-- Sanjoy