[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr
Adam Nemet
anemet at apple.com
Wed Jun 10 22:59:24 PDT 2015
> On Jun 10, 2015, at 6:17 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>
> I'm not sure if inbounds can be used to prove <nuw>. If an object
> %OBJ is allocated at address -1 then "gep inbounds %OBJ 1" is not
> poison, but the underlying computation unsigned-overflows.
I think that this should yield poison per langref because the signed arithmetic implied by the indices should be performed with infinite precision. Base is treated as unsigned so 0xff…ff + 1 would be 0x100…00 which would be out of bounds (sort of). See also this which I think is even more explicit: http://llvm.org/docs/GetElementPtr.html#what-happens-if-a-gep-computation-overflows <http://llvm.org/docs/GetElementPtr.html#what-happens-if-a-gep-computation-overflows>
Adam
> It may be possible to use inbounds to derive <nsw> but that too is
> tricky since inbounds does not directly imply anything about no-wrap
> as far as I can tell.
>
> On Wed, Jun 10, 2015 at 5:29 PM, Adam Nemet <anemet at apple.com> wrote:
>> [+Arnold]
>>
>>> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>>
>>> [+CC Andy]
>>>
>>>> Can anyone familiar with ScalarRevolution tell me whether this is an
>>>> expected behavior or a bug?
>>>
>>> Assuming you're talking about 2*k, this is a bug. ScalarEvolution
>>> should be able to prove that {0,+,4} is <nsw> and <nuw>.
>>
>> I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this.
>>
>> Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping.
>>
>> A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping.
>>
>> Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why…
>>
>> Adam
>>
>>>
>>> Part of the problem is that in this case ScalarEvolution does not try
>>> to prove that {0,+,4} is <nsw> when the expression is constructed
>>> (since proving that has non-trivial cost) [1]. To get ScalarEvolution
>>> to try to prove that {0,+,4} has no-wrap properties, the client needs
>>> to construct a sign-extend expression of {0,+,4}. SCEV will try to
>>> change a sext of an add-rec to an add-rec of sexts and try to prove
>>> no-wrap in the process [2].
>>>
>>> To easily do this from IR, you can just add a dummy sext instruction
>>> (like in the IR fragment below) and run
>>> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV
>>> won't dce the unused instruction):
>>>
>>> ; ModuleID = '<stdin>'
>>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>> target triple = "x86_64-apple-macosx10.10.0"
>>>
>>> @x = common global [1024 x float] zeroinitializer, align 16
>>> @y = common global [1024 x float] zeroinitializer, align 16
>>>
>>> ; Function Attrs: nounwind ssp uwtable
>>> define void @myloop1() {
>>> bb:
>>> br label %bb2
>>>
>>> bb1: ; preds = %bb2
>>> ret void
>>>
>>> bb2: ; preds = %bb2, %bb
>>> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]
>>> %tmp = shl nsw i64 %k.01, 1
>>> %dummy_sext = sext i64 %tmp to i128
>>> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,
>>> i64 0, i64 %tmp
>>> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2
>>> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,
>>> i64 0, i64 %k.01
>>> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2
>>> %tmp7 = fadd float %tmp4, %tmp6
>>> store float %tmp7, float* %tmp3, align 16, !tbaa !2
>>> %tmp8 = or i64 %k.01, 1
>>> %tmp9 = shl nsw i64 %tmp8, 1
>>> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*
>>> @x, i64 0, i64 %tmp9
>>> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2
>>> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*
>>> @y, i64 0, i64 %tmp8
>>> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2
>>> %tmp14 = fadd float %tmp11, %tmp13
>>> store float %tmp14, float* %tmp10, align 8, !tbaa !2
>>> %tmp15 = add nsw i64 %k.01, 2
>>> %exitcond.1 = icmp eq i64 %tmp15, 512
>>> br i1 %exitcond.1, label %bb1, label %bb2
>>> }
>>>
>>> !0 = !{i32 1, !"PIC Level", i32 2}
>>> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"}
>>> !2 = !{!3, !3, i64 0}
>>> !3 = !{!"float", !4, i64 0}
>>> !4 = !{!"omnipotent char", !5, i64 0}
>>> !5 = !{!"Simple C/C++ TBAA"}
>>>
>>>
>>> However, adding a dummy sext only proves <nuw> for {0,+,4} and not
>>> <nsw>. The problem is that when constructing sext({0,+,4}) SCEV
>>> realizes that since {0,+,4} is always positive, sext({0,+,4}) ==
>>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of
>>> zexts, SCEV only needs to prove <nuw> and not <nsw>.
>>>
>>>
>>> I wonder if it makes sense to add a hook to SCEV that gets it to try
>>> as hard as it can to prove <nuw> / <nsw> for specific add recurrences.
>>>
>>>
>>>
>>> [1]: It will try to prove nuw and nsw in cases where it is "easy", but
>>> not in this specific case.
>>>
>>> [2]: So a worthwhile project is to have the vectorizer construct sign
>>> extend expressions of add recurrences that it really cares about
>>> proving no-wrap of and then check the flags on the
>>> SCEVAddRecExpr. It may consume too much compile time, so there's
>>> a tricky trade-off here.
>>>
>>>
>>> -- Sanjoy
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150610/fa42e095/attachment.html>
More information about the llvm-dev
mailing list