[LLVMdev] SCEV and GEP NSW flag

Hal Finkel hfinkel at anl.gov
Thu Oct 31 13:24:42 PDT 2013


Andy, et al.,

If I start with C code like this:

void foo(long k, int * restrict a, int * restrict b, int * restrict c) {
  if (k > 0) {
    for (int i = 0; i < 2047; i++) {
      a[i] = a[i + k] + b[i] * c[i];
    }
  }
}

Clang -O3 will produce code like this:

entry:
  %cmp = icmp sgt i64 %k, 0
  br i1 %cmp, label %for.body.preheader, label %if.end

for.body.preheader:
  br label %for.body

for.body:
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
  %add = add nsw i64 %indvars.iv, %k
  %arrayidx = getelementptr inbounds i32* %a, i64 %add
  %0 = load i32* %arrayidx, align 4, !tbaa !1
...
  %arrayidx7 = getelementptr inbounds i32* %a, i64 %indvars.iv
  store i32 %add5, i32* %arrayidx7, align 4, !tbaa !1
...

My goal here is to detect that, within the loop, (&a[i] - &a[i + k]) is negative, as the explicit loop guard guarantees. On the path toward that goal, I've run into the following:

The SCEV for &a[i] looks like this: {%a,+,4}<nsw><%for.body>

but the SCEV for &a[i + k] looks like this: {((4 * %k) + %a),+,4}<%for.body>

and the problem is that this expression, even though it comes from a inbounds GEP, with nsw on all contributing instructions, does not have the <nsw> flag. I think that the problem is the (4 * %k) 'Index' expression, which lacks the no-wrap flags, and when createNodeForGEP uses it to construct the overall expression for the address, the result also lacks the nsw flag.

The reason that the (4 * %k) 'Index' expression lacks the flags is explained in createSCEV:

    // Don't apply this instruction's NSW or NUW flags to the new
    // expression. The instruction may be guarded by control flow that the
    // no-wrap behavior depends on. Non-control-equivalent instructions can be
    // mapped to the same SCEV expression, and it would be incorrect to transfer
    // NSW/NUW semantics to those operations.

and I appreciate this concern, but I suspect it is supposed to be possible to recover the flags on the resulting AddRec, and how is that supposed to happen? One issue may be that the control flow mentioned in the comment could be in the loop, although in this one-BB loop with no calls, I think that can be ruled out.

Thanks again,
Hal

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the llvm-dev mailing list