[LLVMdev] SCEV and GEP NSW flag
Hal Finkel
hfinkel at anl.gov
Sat Nov 2 05:43:09 PDT 2013
----- Original Message -----
>
> On Oct 31, 2013, at 1:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> > Andy, et al.,
> >
> > If I start with C code like this:
> >
> > void foo(long k, int * restrict a, int * restrict b, int * restrict
> > c) {
> > if (k > 0) {
> > for (int i = 0; i < 2047; i++) {
> > a[i] = a[i + k] + b[i] * c[i];
> > }
> > }
> > }
> >
> > Clang -O3 will produce code like this:
> >
> > entry:
> > %cmp = icmp sgt i64 %k, 0
> > br i1 %cmp, label %for.body.preheader, label %if.end
> >
> > for.body.preheader:
> > br label %for.body
> >
> > for.body:
> > %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0,
> > %for.body.preheader ]
> > %add = add nsw i64 %indvars.iv, %k
> > %arrayidx = getelementptr inbounds i32* %a, i64 %add
> > %0 = load i32* %arrayidx, align 4, !tbaa !1
> > ...
> > %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
> > store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
> > ...
> >
> > My goal here is to detect that, within the loop, (&a[i] - &a[i +
> > k]) is negative, as the explicit loop guard guarantees. On the
> > path toward that goal, I've run into the following:
> >
> > The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body>
> >
> > but the SCEV for &a[i + k] looks like this: {((4 * %k) +
> > %a),+,4}<%for.body>
> >
> > and the problem is that this expression, even though it comes from
> > a inbounds GEP, with nsw on all contributing instructions, does
> > not have the <nsw> flag. I think that the problem is the (4 * %k)
> > 'Index' expression, which lacks the no-wrap flags, and when
> > createNodeForGEP uses it to construct the overall expression for
> > the address, the result also lacks the nsw flag.
> >
> > The reason that the (4 * %k) 'Index' expression lacks the flags is
> > explained in createSCEV:
> >
> > // Don't apply this instruction's NSW or NUW flags to the new
> > // expression. The instruction may be guarded by control flow
> > that the
> > // no-wrap behavior depends on. Non-control-equivalent
> > instructions can be
> > // mapped to the same SCEV expression, and it would be incorrect
> > to transfer
> > // NSW/NUW semantics to those operations.
> >
> > and I appreciate this concern, but I suspect it is supposed to be
> > possible to recover the flags on the resulting AddRec, and how is
> > that supposed to happen? One issue may be that the control flow
> > mentioned in the comment could be in the loop, although in this
> > one-BB loop with no calls, I think that can be ruled out.
>
> I'm sorry, I don't understand why the scale and addition on an
> inbounds GEP can be considered NSW. Can we assume malloc won't span
> the high/low address boundary?
I think that we're okay on this front. The language reference defines inbounds GEPs in terms of "infinitely precise signed arithmetic", so I think that wrapping and being inbounds are incompatible.
>Then were left with folks doing
> mmap... Well, based on my understanding, createNodeForGEP looks
> wrong--I really hope I'm the one that's wrong.
I think that the code is proabably okay: it is not forcing the result to have the nsw flag (if it did, I probably would not have written my e-mail), but applying it only to the 'address computation' itself. On the other hand, maybe that violates SCEV's general philosophy of being control-flow independent (maybe the logic that prohibits us from applying an add's nsw flag to an SCEVAddExpr should prohibit this as well).
>
> As an aside, you could still prove no-self-wrap for this recurrence.
> If the base of the GEP is an NW recurrence (note that NSW or NUW
> implies NW), the GEP itself is inbounds, and the GEP simplifies to a
> recurrences, then I think you can infer that the recurrence for the
> GEP is also NW.
Okay, so you mean that I could force set the NW flag on the resulting GEP in that case?
> I don’t think that adds useful information to this
> query though :(
>
> Along those lines, I’m a little baffled as to why you want an NSW
> recurrence. You already know the difference is 4*k and k > 0. Isn’t
> that enough? Is the problem that 4*k may overflow?
Ah, I forgot to explain, one problem is that I can't divide {((4 * %k) + %a),+,4}<%for.body> by 4 to get ride of the type size. I was trying to do this in order to make it easier for isLoopEntryGuardedByCond to match the condition, but even dividing by 4 does not work if the recurrence is not <nsw> (as far as I can tell).
>
> Maybe this sort of alias analysis needs to reason about array
> indices, not addresses.
Being able to divide by the type sizes would make this easier ;)
Thanks again,
Hal
>
> -Andy
>
> >
> > Thanks again,
> > Hal
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list