[PATCH] [SCEV][LoopVectorize] Allow ScalarEvolution to make assumptions about overflows

Wed Jun 24 10:32:51 PDT 2015

Hi Adam,

In http://reviews.llvm.org/D10161#193173, @anemet wrote:

> Can you please discuss the use-cases?  We all ran into SCEV not always being the right vehicle to prove no-overflow but this proposes a pretty big change, so I want to make sure we can't take more targeted/distributed solutions to the problem.  (My general feeling is similar to what Andy and Sanjoy have already expressed.)

I think the biggest problem would be where overflow can happen, depending on the input data. I would prefer to add some no overflow proving code for cases when it is possbile to prove rather than implementing something like this, but it looks like for certain cases there is no work-around. I've listed some examples below.

> I feel like that in some cases, we can prove no-overflow at *compile* time by further analyzing the IR (like what I am proposing in http://reviews.llvm.org/D10472).  Essentially this is relying on C/C++ signed overflow being undefined.

Ideally we would figure out at compile time, but I think it would be impossible (or impractical) to cover all the cases where we could prove this, and there are cases where we cannot prove no-overflow (and the overflow condition would depend on input data).

> In other cases we may need prove no-overflow of smaller types so that we can up-level the sign/zero-extensions.  Is this perhaps something that's better done in indvars?  The idea is (maybe flawed) that you can eliminate an extension in the loop by using an overflow check outside the loop.

> 

> Anyhow, you collected some testcases so categorizing the issues would probably help the discussion.

> 

> I am also in favor of allowing finer level of control along the lines of Sanjoy's comments.  Your approach may work for the vectorizer but in case of the general dependence analysis, we may not need to prove of no-overflow of all pointers.  For example, if a pointer can't alias with any other accesses in the loop, we don't care that we can't get a true affine form for it.

Yes, good point! I need to think a bit about the interface (and perhaps do some experimenting), but it should definitely be possible to have something similar to what Sanjoy suggested (and I like the idea).

The test cases I have come from C/C++ where unsigned integers are being used as induction variables and for some reason they would get extended at some point. There are cases where we wouldn't need the extend, but those are outside current scope. The problem with these cases is that the behaviour of unsigned overflow is defined (at least in C/C++), and there is no way of statically reasoning about these cases (we can overflow, and it can for example cause infinite loops). For example:

void test(uint32_t n) {

  for (uint16_t i = 0; i < n; ++i) {
    <do something>
  }

}

Here we would need to compare i can overflow, and for values larger than 2^16-1 we'll get an infinite loop.
In fact there is no way to progress here besides versioning the loop as far as I can see.

A related example:

void test(uint32_t n) {

  for (uint32_t i = 0; i <= n; ++i) {
    <do something>
  }

}

Here we have no extend operations, but for n == 2^32-1 this will be an infinite loop and i will overflow. SCEV gives up on computing the backedge count because there is no correct result that it can give.

This can affect memory accesses as well:

void test(uint32_t n, uint16_t ind, uint32_t offset, char *a, char *b) {

  for (uint16_t i = 0; i < n; ++i) {
    ind++;
    a[ind + offset] = b[ind + offset] * 3;
  }

}

ind + offset does not evaluate here as a chrec. ind is a chrec, but then we apply zext to it and add it to the offset. The absence of nsw/nuw means that we actually get add(offset, zext({ind, +, 1}) for ind + offset. This causes a number of problems (accesses to a are not consecutive, etc) And that's correct, they can be non-consecutive for some values of n - but unlikely to ever happen at execution.

In fact since we would get or not sign extends here depending on the pointer size, this would mean that it's possible to run into issues when porting code from a 32-bit target to a 64-bit one.

I think there are valid reasons to use usigned integers (eg. for the extra range), so it may be easy to hit these cases. But it looks like any combination of unsigned integers and zext will disable most loop optimizations? Having unsigned integers in the exit condition will probably cause issues on its own.

Thanks,
Silviu

http://reviews.llvm.org/D10161

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/