[LLVMdev] Question about NoWrap flag for SCEVAddRecExpr
Adam Nemet
anemet at apple.com
Mon Jun 15 22:49:55 PDT 2015
> On Jun 11, 2015, at 12:24 AM, Adam Nemet <anemet at apple.com> wrote:
>
>
>> On Jun 10, 2015, at 6:19 PM, Arnold <aschwaighofer at apple.com> wrote:
>>
>> Inbounds does not make any statements about the behavior of a sub expression involved in an index to the gep
>>
>> addr =
>> index = 2*k // this can very well wrap
>> gep addr, ..., index // addr +index is inbounds
>
> … and since 2*k is part of the SCEV ({x,+,2}), the SCEV may wrap. Got it, thanks.
>
> That said, we should still be able to analyze these, I think:
>
> 1. {x,+,2} vs {x,+,2}
>
> There is obviously a dep-distance 0 between them regardless of wrapping.
I’ve actually changed direction on this. The problem is that wrapping can introduce loop-carried dependences that are hard to figure out. E.g., consider the IR equivalent of this:
for (unsigned char i = 0; i < 16; i++)
A[128 * i] = A[128 * i] + 2;
(it does not really work in C due to the default promotion of the char arithmetic)
The value written in iteration I will be read in I + 2, so we have a backward dependence with distance 2.
So rather than this approach, I went back trying to prove that the pointer can’t wrap. (The mul is actually marked non-wrapping in the IR just SCEV can’t use it.) See http://reviews.llvm.org/D10472.
Feedback is welcome. Thanks.
Adam
> 2. {x+8,+,2} vs {x,+,2}
>
> Because of the known small-constant difference, this is either a dep-distance of 8 or 8-Size_of_addr_space. Thus as long as this is normally a forward dep of a small known distance (and with wrapping huge-distance backward dep) we should be able to vectorize this.
>
> What do you think?
>
> Adam
>
>>
>> There is a slight of hands that we do in the vectorizer with unit strides where we rely on contiguous adress space.
>>
>>
>>
>> Sent from my iPhone
>>
>>> On Jun 10, 2015, at 5:29 PM, Adam Nemet <anemet at apple.com> wrote:
>>>
>>> [+Arnold]
>>>
>>>> On Jun 10, 2015, at 1:29 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>>>
>>>> [+CC Andy]
>>>>
>>>>> Can anyone familiar with ScalarRevolution tell me whether this is an
>>>>> expected behavior or a bug?
>>>>
>>>> Assuming you're talking about 2*k, this is a bug. ScalarEvolution
>>>> should be able to prove that {0,+,4} is <nsw> and <nuw>.
>>>
>>> I also find it surprising that the inbounds gep does not allow us to prove nuw of the pointer here. LAA has logic for this.
>>>
>>> Not to mention that we’re trying to figure out the distance of x[2*k] against *itself* which should be zero regardless of wrapping.
>>>
>>> A probably more interesting case is x[2*k] = x[2*k+2] + … which would require non-wrapping.
>>>
>>> Looks like we only consider an inbounds gep non-wrapping if it uses a unit stride. I am not sure why…
>>>
>>> Adam
>>>
>>>>
>>>> Part of the problem is that in this case ScalarEvolution does not try
>>>> to prove that {0,+,4} is <nsw> when the expression is constructed
>>>> (since proving that has non-trivial cost) [1]. To get ScalarEvolution
>>>> to try to prove that {0,+,4} has no-wrap properties, the client needs
>>>> to construct a sign-extend expression of {0,+,4}. SCEV will try to
>>>> change a sext of an add-rec to an add-rec of sexts and try to prove
>>>> no-wrap in the process [2].
>>>>
>>>> To easily do this from IR, you can just add a dummy sext instruction
>>>> (like in the IR fragment below) and run
>>>> 'opt -analyze -scalar-evolution -scalar-evolution' (just running SCEV
>>>> won't dce the unused instruction):
>>>>
>>>> ; ModuleID = '<stdin>'
>>>> target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>>> target triple = "x86_64-apple-macosx10.10.0"
>>>>
>>>> @x = common global [1024 x float] zeroinitializer, align 16
>>>> @y = common global [1024 x float] zeroinitializer, align 16
>>>>
>>>> ; Function Attrs: nounwind ssp uwtable
>>>> define void @myloop1() {
>>>> bb:
>>>> br label %bb2
>>>>
>>>> bb1: ; preds = %bb2
>>>> ret void
>>>>
>>>> bb2: ; preds = %bb2, %bb
>>>> %k.01 = phi i64 [ 0, %bb ], [ %tmp15, %bb2 ]
>>>> %tmp = shl nsw i64 %k.01, 1
>>>> %dummy_sext = sext i64 %tmp to i128
>>>> %tmp3 = getelementptr inbounds [1024 x float], [1024 x float]* @x,
>>>> i64 0, i64 %tmp
>>>> %tmp4 = load float, float* %tmp3, align 16, !tbaa !2
>>>> %tmp5 = getelementptr inbounds [1024 x float], [1024 x float]* @y,
>>>> i64 0, i64 %k.01
>>>> %tmp6 = load float, float* %tmp5, align 8, !tbaa !2
>>>> %tmp7 = fadd float %tmp4, %tmp6
>>>> store float %tmp7, float* %tmp3, align 16, !tbaa !2
>>>> %tmp8 = or i64 %k.01, 1
>>>> %tmp9 = shl nsw i64 %tmp8, 1
>>>> %tmp10 = getelementptr inbounds [1024 x float], [1024 x float]*
>>>> @x, i64 0, i64 %tmp9
>>>> %tmp11 = load float, float* %tmp10, align 8, !tbaa !2
>>>> %tmp12 = getelementptr inbounds [1024 x float], [1024 x float]*
>>>> @y, i64 0, i64 %tmp8
>>>> %tmp13 = load float, float* %tmp12, align 4, !tbaa !2
>>>> %tmp14 = fadd float %tmp11, %tmp13
>>>> store float %tmp14, float* %tmp10, align 8, !tbaa !2
>>>> %tmp15 = add nsw i64 %k.01, 2
>>>> %exitcond.1 = icmp eq i64 %tmp15, 512
>>>> br i1 %exitcond.1, label %bb1, label %bb2
>>>> }
>>>>
>>>> !0 = !{i32 1, !"PIC Level", i32 2}
>>>> !1 = !{!"clang version 3.7.0 (clang-stage2-configure-Rlto_build 239114)"}
>>>> !2 = !{!3, !3, i64 0}
>>>> !3 = !{!"float", !4, i64 0}
>>>> !4 = !{!"omnipotent char", !5, i64 0}
>>>> !5 = !{!"Simple C/C++ TBAA"}
>>>>
>>>>
>>>> However, adding a dummy sext only proves <nuw> for {0,+,4} and not
>>>> <nsw>. The problem is that when constructing sext({0,+,4}) SCEV
>>>> realizes that since {0,+,4} is always positive, sext({0,+,4}) ==
>>>> zext({0,+,4}); and to change a zext of an add-rec to an add-rec of
>>>> zexts, SCEV only needs to prove <nuw> and not <nsw>.
>>>>
>>>>
>>>> I wonder if it makes sense to add a hook to SCEV that gets it to try
>>>> as hard as it can to prove <nuw> / <nsw> for specific add recurrences.
>>>>
>>>>
>>>>
>>>> [1]: It will try to prove nuw and nsw in cases where it is "easy", but
>>>> not in this specific case.
>>>>
>>>> [2]: So a worthwhile project is to have the vectorizer construct sign
>>>> extend expressions of add recurrences that it really cares about
>>>> proving no-wrap of and then check the flags on the
>>>> SCEVAddRecExpr. It may consume too much compile time, so there's
>>>> a tricky trade-off here.
>>>>
>>>>
>>>> -- Sanjoy
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>
More information about the llvm-dev
mailing list