De-pessimize ScalarEvolution's handling of nsw/nuw?

Thu Dec 4 13:39:10 PST 2014

I was chasing down a performance regression on a CUDA benchmark compiled
for the NVPTX64 backend, and found loop strength reduction is ineffective
in the presence of sign extension. Here's a reduced test case:

void foo(float *input, int n) {
  for (int i = -n; i != n; ++i) {

    baz(input[i + 5]);

  }
}

I expect &input[i + 5] to be promoted to an indvar but it's not.

The root cause of this misoptimization is that ScalarEvolution is
pessimistic about tagging nsw/nuw to a SCEVAddExpr. This pessimization was
introduced in http://llvm.org/viewvc/llvm-project?view=revision&revision=
145367. According to the comments there (
http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04087),
ScalarEvolution does not apply an instruction's nsw/nuw flags to the
corresponding SCEV expression. In the above example, &input[i + 5]
corresponds to SCEV expression input + 4 * sext(i + 5). In order to promote
&input[i + 5] to an indvar, we need to at least prove (i + 5) does not sign
overflow so that we can reassociate the expression to (input + 5) + 4 *
sext(i) which can be represented as a SCEVAddRecExpr. However, because
ScalarEvolution doesn't apply sext to (i + 5), it cannot distribute sext(i +
5) to sext(i) + 5, and is thus unable to identify &input[i + 5] as a
potential indvar.

Side note: this issue kicked in after my recent recent change that disables
induction variable widening for the NVPTX64 backend. This issue used to be
alleviated (if any) by induction variable widening because there wouldn't
be any sext if index i is already 64-bit.

I wonder if the fix which disables applying nsw/nuw is too conservative.
The comments in the source code say that ScalarEvolution does not apply an
instruction's nsw/nuw flags to the corresponding SCEV expression because
another non-control-equivalent instruction without nsw/nuw can be mapped to
the same expression. If that's the only case we worried about, is a better
fix to be not mapping instructions only differ in nsw/nuw to the same SCEV
expression? That can be done by adding the wrapping flag of a SCEVAddExpr
expression to the folding set that serves as the index of this expression
for SCEV look-up.

I followed this idea, and tried a preliminary change (
http://reviews.llvm.org/differential/diff/16942/). It works fine so far: no
transformation tests failed; some analysis tests failed but the new results
seem better instead of incorrect. I wonder if I was just lucky on not
breaking tests or it is the right way to go.

Jingyue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141204/ebdba887/attachment.html>