De-pessimize ScalarEvolution's handling of nsw/nuw?
Philip Reames
listmail at philipreames.com
Mon Dec 15 17:18:03 PST 2014
On 12/04/2014 01:39 PM, Jingyue Wu wrote:
> I was chasing down a performance regression on a CUDA benchmark
> compiled for the NVPTX64 backend, and found loop strength reduction is
> ineffective in the presence of sign extension. Here's a reduced test case:
>
> void foo(float *input, int n) {
> for (int i = -n; i != n; ++i) {
> baz(input[i + 5]);
> }
> }
>
> I expect &input[i + 5] to be promoted to an indvar but it's not.
>
> The root cause of this misoptimization is that ScalarEvolution is
> pessimistic about tagging nsw/nuw to a SCEVAddExpr. This pessimization
> was introduced in
> http://llvm.org/viewvc/llvm-project?view=revision&revision=145367.According
> to the comments there
> (http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04087),
> ScalarEvolution does not apply an instruction's nsw/nuw flags to the
> corresponding SCEV expression. In the above example, &input[i + 5]
> corresponds to SCEV expression input + 4 * sext(i + 5). In order to
> promote &input[i + 5] to an indvar, we need to at least prove (i + 5)
> does not sign overflow so that we can reassociate the expression to
> (input + 5) + 4 * sext(i) which can be represented as a
> SCEVAddRecExpr. However, because ScalarEvolution doesn't apply sext to
> (i + 5), it cannot distribute sext(i + 5) to sext(i) + 5, and is thus
> unable to identify &input[i + 5] as a potential indvar.
>
> Side note: this issue kicked in after my recent recent change that
> disables induction variable widening for the NVPTX64 backend. This
> issue used to be alleviated (if any) by induction variable widening
> because there wouldn't be any sext if index i is already 64-bit.
>
> I wonder if the fix which disables applying nsw/nuw is too
> conservative. The comments in the source code say that ScalarEvolution
> does not apply an instruction's nsw/nuw flags to the corresponding
> SCEV expression because another non-control-equivalent instruction
> without nsw/nuw can be mapped to the same expression. If that's the
> only case we worried about, is a better fix to be not mapping
> instructions only differ in nsw/nuw to the same SCEV expression? That
> can be done by adding the wrapping flag of a SCEVAddExpr expression to
> the folding set that serves as the index of this expression for SCEV
> look-up.
>
> I followed this idea, and tried a preliminary change
> (http://reviews.llvm.org/differential/diff/16942/). It works fine so
> far: no transformation tests failed; some analysis tests failed but
> the new results seem better instead of incorrect. I wonder if I was
> just lucky on not breaking tests or it is the right way to go.
Speaking as someone who is not an expert in this code, your general
approach seems workable. I don't have a good understanding of what the
implication of reducing the commoning of SCEV would be though. That
would be my biggest concern.
I'd suggest you post a patch which gets at least one interesting example
working. Concrete patches w/compelling test cases tend to get better
discussion on llvm-commits. Alternatively, you might try directing your
email to llvm-dev.
p.s. Your actual patch looks suspicious. Shouldn't you be checking the
flags on each of the adds visited in the loop above?
>
> Jingyue
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141215/55ae1353/attachment.html>
More information about the llvm-commits
mailing list