[PATCH] [LSR] Generate and use zero extends

Wed Apr 29 21:22:20 PDT 2015

> On Apr 27, 2015, at 2:05 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
>> Does your test fail on trunk? This is what the test outputs for me:
>> 
>> ok_161:                                           ; preds = %ok_158
>> 
>>  %lsr.iv.next = add nuw nsw i64 %lsr.iv, 1
>>  %4 = add i64 %0, %lsr.iv.next
>>  %tmp1 = trunc i64 %4 to i32
>>  %tmp188 = icmp slt i32 %tmp1, %tmp160
>>  br i1 %tmp188, label %ok_146, label %block_81
> 
> That's behavior I'm trying to avoid -- I don't want two additions on
> every iteration of the loop.
> 
> To elaborate a little bit, here is the -debug-only=loop-reduce on the
> test case before the change:

You’re test makes sense. I was reading it wrong.

> 
>> Would you be able to test both post-inc and pre-inc variants? We've a
>> lot of bugs because of TransformForPostIncUse.
> 
> I think that is a good idea.  I'm not familiar with LSR so I suspect
> it will take me some time to get there, but just that is actually a
> good reason for writing these tests. :)
> 
> It also looks like (I'm not a 100% sure) that there's an inconsistency
> in how LSR treats overflow in post-inc expressions vs. how SCEV treats
> them.  For instance, in ScalarEvolution.cpp line ~1500 we say that
> {S,+,X} is <nuw> if "S + MaxBECount * X" does not unsigned
> overflow. But that does not say anything about the overflow of the
> post-inc SCEV expression (e.g. if BECount == 0 then all add recs are
> <nuw>).
> 
> The problem is that TransformForPostIncUse(zext({S,+,X})) =
> zext({S+X,+,X}) -- ScalarEvolutionNormalization.cpp, line ~100.
> 
> Now if {S,+,1} is nuw then {zext S,+,1} = zext({S,+,1}), but
> {(zext S) + 1,+,1} is != zext({S+1,+,1}) by SCEV's definition of nuw.
> But TransformForPostIncUse({zext S,+,1}) =
> TransformForPostIncUse(zext({S,+,1})) = zext({S+1,+,1}) when it should
> really be {(zext S) + 1,+,1}.  I don't have a concrete example where
> this is a problem, but this definitely looks fishy to me.

TransformForPostInc is the most confusing and bug prone parts of LSR. I haven't looked at an example, just stared at the code for a while. I would expect it to work like this:

Original SCEV for use:
  zext{I,+,1}<nuw>

Reduced SCEV:
  {zext(I),+,1}<nuw>

Normalized: (nuw should be dropped)
  {zext(I) - 1,+,1}

I think this line should drop nuw:

+    // Conservatively use AnyWrap until/unless we need FlagNW.
+    const SCEV *Result = SE.getAddRecExpr(Operands, L, SCEV::FlagAnyWrap);

And if that line doesn't drop it, then this line should:
+        Result = SE.getMinusSCEV(Result, TransformedStep);

Denormalized:
  {zext(I),+,1}

That's conservative, because we don't really need to drop NUW.

I'm not sure where this equality that you're showing came from:
TransformForPostIncUse({zext S,+,1}) =
TransformForPostIncUse(zext({S,+,1}))

Andy

> On Mon, Apr 27, 2015 at 11:14 AM, Andrew Trick <atrick at apple.com> wrote:
>> This looks reasonable. Thanks!
>> 
>> Does your test fail on trunk? This is what the test outputs for me:
>> 
>> ok_161:                                           ; preds = %ok_158
>> 
>>  %lsr.iv.next = add nuw nsw i64 %lsr.iv, 1
>>  %4 = add i64 %0, %lsr.iv.next
>>  %tmp1 = trunc i64 %4 to i32
>>  %tmp188 = icmp slt i32 %tmp1, %tmp160
>>  br i1 %tmp188, label %ok_146, label %block_81
>> 
>> Would you be able to test both post-inc and pre-inc variants? We've a lot of bugs because of TransformForPostIncUse.
>> 
>> 
>> ================
>> Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:3685
>> @@ -3669,1 +3684,3 @@
>> 
>> +/// GenerateZExts - Generate reuse formulae from different IV types.
>> +void LSRInstance::GenerateZExts(LSRUse &LU, unsigned LUIdx, Formula Base) {
>> ----------------
>> Please comment this at least as well as your commit comment if not better.
>> 
>> http://reviews.llvm.org/D9181
>> 
>> EMAIL PREFERENCES
>>  http://reviews.llvm.org/settings/panel/emailpreferences/
>> 
>>