[PATCH] D95521: [SCEV] Apply loop guards to zero modulo conditions

Mon Feb 1 08:45:31 PST 2021

gilr marked an inline comment as done.
gilr added a comment.

In D95521#2533644 <https://reviews.llvm.org/D95521#2533644>, @fhahn wrote:

> In D95521#2533414 <https://reviews.llvm.org/D95521#2533414>, @fhahn wrote:
>
>> I did some testing and it appears this exposes a crash in `matchURem`. I'm taking a look at that now, I think it would be good to wait with landing the change until this is resolved.
>
> Should be fixed by f1e8136115ac <https://reviews.llvm.org/rGf1e8136115ac86a633f670cd4d50cf41b71418d8>

Excellent, thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll:67

-define dso_local void @assumeAlignedTC(i32* noalias nocapture %A, i32* %p) optsize {
 ; CHECK-LABEL: @assumeAlignedTC(
----------------
fhahn wrote:
> could you add a similar negative test, e.g. where the `and` does not strip the lowest bit?
> 
> Perhaps also add a test using `urem`?
> could you add a similar negative test, e.g. where the and does not strip the lowest bit?

Will do.

> Perhaps also add a test using urem?

Ah, good catch!
Trying VF=4, IC=3 exposed two issues:

- While `applyLoopGuards()` rewrites my divisible-by-12 TC correctly to (`12 * ...`), `getURemExpr(TC, 12)` doesn't fold to zero. This can perhaps be solved by using SCEVDivision instead of getURemExpr(). Will try that as an improvement over this patch.

- The test then hits the assert `"VF*UF must be a power of 2 when folding tail by masking"` later in LV. This isn't related to this patch (can be reproduced by forcing `optsize` and IC=3), so I'll upload a separate fix for it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95521/new/

https://reviews.llvm.org/D95521