De-pessimize ScalarEvolution's handling of nsw/nuw?

Mon Dec 15 17:18:03 PST 2014

On 12/04/2014 01:39 PM, Jingyue Wu wrote:
> I was chasing down a performance regression on a CUDA benchmark 
> compiled for the NVPTX64 backend, and found loop strength reduction is 
> ineffective in the presence of sign extension. Here's a reduced test case:
>
> void foo(float *input, int n) {
>   for (int i = -n; i != n; ++i) {
>     baz(input[i + 5]);
>   }
> }
>
> I expect &input[i + 5] to be promoted to an indvar but it's not.
>
> The root cause of this misoptimization is that ScalarEvolution is 
> pessimistic about tagging nsw/nuw to a SCEVAddExpr. This pessimization 
> was introduced in 
> http://llvm.org/viewvc/llvm-project?view=revision&revision=145367.According 
> to the comments there 
> (http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04087), 
> ScalarEvolution does not apply an instruction's nsw/nuw flags to the 
> corresponding SCEV expression. In the above example, &input[i + 5] 
> corresponds to SCEV expression input + 4 * sext(i + 5). In order to 
> promote &input[i + 5] to an indvar, we need to at least prove (i + 5) 
> does not sign overflow so that we can reassociate the expression to 
> (input + 5) + 4 * sext(i) which can be represented as a 
> SCEVAddRecExpr. However, because ScalarEvolution doesn't apply sext to 
> (i + 5), it cannot distribute sext(i + 5) to sext(i) + 5, and is thus 
> unable to identify &input[i + 5] as a potential indvar.
>
> Side note: this issue kicked in after my recent recent change that 
> disables induction variable widening for the NVPTX64 backend. This 
> issue used to be alleviated (if any) by induction variable widening 
> because there wouldn't be any sext if index i is already 64-bit.
>
> I wonder if the fix which disables applying nsw/nuw is too 
> conservative. The comments in the source code say that ScalarEvolution 
> does not apply an instruction's nsw/nuw flags to the corresponding 
> SCEV expression because another non-control-equivalent instruction 
> without nsw/nuw can be mapped to the same expression. If that's the 
> only case we worried about, is a better fix to be not mapping 
> instructions only differ in nsw/nuw to the same SCEV expression? That 
> can be done by adding the wrapping flag of a SCEVAddExpr expression to 
> the folding set that serves as the index of this expression for SCEV 
> look-up.
>
> I followed this idea, and tried a preliminary change 
> (http://reviews.llvm.org/differential/diff/16942/). It works fine so 
> far: no transformation tests failed; some analysis tests failed but 
> the new results seem better instead of incorrect. I wonder if I was 
> just lucky on not breaking tests or it is the right way to go.
Speaking as someone who is not an expert in this code, your general 
approach seems workable.  I don't have a good understanding of what the 
implication of reducing the commoning of SCEV would be though. That 
would be my biggest concern.

I'd suggest you post a patch which gets at least one interesting example 
working.  Concrete patches w/compelling test cases tend to get better 
discussion on llvm-commits.  Alternatively, you might try directing your 
email to llvm-dev.

p.s. Your actual patch looks suspicious.  Shouldn't you be checking the 
flags on each of the adds visited in the loop above?
>
> Jingyue
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141215/55ae1353/attachment.html>