[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr

Wed Jun 10 18:19:39 PDT 2015

Inbounds does not make any statements about the behavior of a sub expression involved in an index to the gep 

addr = 
index = 2*k // this can very well wrap
gep addr, ..., index // addr +index  is inbounds

There is a slight of hands that we do in the vectorizer with unit strides where we rely on contiguous adress space.

Sent from my iPhone

> On Jun 10, 2015, at 5:29 PM, Adam Nemet <anemet at apple.com> wrote:
> 
> [+Arnold]
> 
>> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>> 
>> [+CC Andy]
>> 
>>> Can anyone familiar with ScalarRevolution tell me whether this is an
>>> expected behavior or a bug?
>> 
>> Assuming you're talking about 2*k, this is a bug.  ScalarEvolution
>> should be able to prove that {0,+,4} is <nsw> and <nuw>.
> 
> I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here.  LAA has logic for this.
> 
> Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping.
> 
> A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping.
> 
> Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride.  I am not sure why…
> 
> Adam
> 
>> 
>> Part of the problem is that in this case ScalarEvolution does not try
>> to prove that {0,+,4} is <nsw> when the expression is constructed
>> (since proving that has non-trivial cost) [1].  To get ScalarEvolution
>> to try to prove that {0,+,4} has no-wrap properties, the client needs
>> to construct a sign-extend expression of {0,+,4}.  SCEV will try to
>> change a sext of an add-rec to an add-rec of sexts and try to prove
>> no-wrap in the process [2].
>> 
>> To easily do this from IR, you can just add a dummy sext instruction
>> (like in the IR fragment below) and run
>> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV
>> won't dce the unused instruction):
>> 
>> ; ModuleID = '<stdin>'
>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>> target triple = "x86_64-apple-macosx10.10.0"
>> 
>> @x = common global [1024 x float] zeroinitializer, align 16
>> @y = common global [1024 x float] zeroinitializer, align 16
>> 
>> ; Function Attrs: nounwind ssp uwtable
>> define void @myloop1() {
>> bb:
>> br label %bb2
>> 
>> bb1:                                              ; preds = %bb2
>> ret void
>> 
>> bb2:                                              ; preds = %bb2, %bb
>> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]
>> %tmp = shl nsw i64 %k.01, 1
>> %dummy_sext = sext i64 %tmp to i128
>> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,
>> i64 0, i64 %tmp
>> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2
>> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,
>> i64 0, i64 %k.01
>> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2
>> %tmp7 = fadd float %tmp4, %tmp6
>> store float %tmp7, float* %tmp3, align 16, !tbaa !2
>> %tmp8 = or i64 %k.01, 1
>> %tmp9 = shl nsw i64 %tmp8, 1
>> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*
>> @x, i64 0, i64 %tmp9
>> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2
>> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*
>> @y, i64 0, i64 %tmp8
>> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2
>> %tmp14 = fadd float %tmp11, %tmp13
>> store float %tmp14, float* %tmp10, align 8, !tbaa !2
>> %tmp15 = add nsw i64 %k.01, 2
>> %exitcond.1 = icmp eq i64 %tmp15, 512
>> br i1 %exitcond.1, label %bb1, label %bb2
>> }
>> 
>> !0 = !{i32 1, !"PIC Level", i32 2}
>> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"}
>> !2 = !{!3, !3, i64 0}
>> !3 = !{!"float", !4, i64 0}
>> !4 = !{!"omnipotent char", !5, i64 0}
>> !5 = !{!"Simple C/C++ TBAA"}
>> 
>> 
>> However, adding a dummy sext only proves <nuw> for {0,+,4} and not
>> <nsw>.  The problem is that when constructing sext({0,+,4}) SCEV
>> realizes that since {0,+,4} is always positive, sext({0,+,4}) ==
>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of
>> zexts, SCEV only needs to prove <nuw> and not <nsw>.
>> 
>> 
>> I wonder if it makes sense to add a hook to SCEV that gets it to try
>> as hard as it can to prove <nuw> / <nsw> for specific add recurrences.
>> 
>> 
>> 
>> [1]: It will try to prove nuw and nsw in cases where it is "easy", but
>>  not in this specific case.
>> 
>> [2]: So a worthwhile project is to have the vectorizer construct sign
>>  extend expressions of add recurrences that it really cares about
>>  proving no-wrap of and then check the flags on the
>>  SCEVAddRecExpr.  It may consume too much compile time, so there's
>>  a tricky trade-off here.
>> 
>> 
>> -- Sanjoy
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>