[PATCH] D95521: [SCEV] Apply loop guards to zero modulo conditions
Gil Rapaport via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 1 08:45:31 PST 2021
gilr marked an inline comment as done.
gilr added a comment.
In D95521#2533644 <https://reviews.llvm.org/D95521#2533644>, @fhahn wrote:
> In D95521#2533414 <https://reviews.llvm.org/D95521#2533414>, @fhahn wrote:
>
>> I did some testing and it appears this exposes a crash in `matchURem`. I'm taking a look at that now, I think it would be good to wait with landing the change until this is resolved.
>
> Should be fixed by f1e8136115ac <https://reviews.llvm.org/rGf1e8136115ac86a633f670cd4d50cf41b71418d8>
Excellent, thanks!
================
Comment at: llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll:67
-define dso_local void @assumeAlignedTC(i32* noalias nocapture %A, i32* %p) optsize {
; CHECK-LABEL: @assumeAlignedTC(
----------------
fhahn wrote:
> could you add a similar negative test, e.g. where the `and` does not strip the lowest bit?
>
> Perhaps also add a test using `urem`?
> could you add a similar negative test, e.g. where the and does not strip the lowest bit?
Will do.
> Perhaps also add a test using urem?
Ah, good catch!
Trying VF=4, IC=3 exposed two issues:
- While `applyLoopGuards()` rewrites my divisible-by-12 TC correctly to (`12 * ...`), `getURemExpr(TC, 12)` doesn't fold to zero. This can perhaps be solved by using SCEVDivision instead of getURemExpr(). Will try that as an improvement over this patch.
- The test then hits the assert `"VF*UF must be a power of 2 when folding tail by masking"` later in LV. This isn't related to this patch (can be reproduced by forcing `optsize` and IC=3), so I'll upload a separate fix for it.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D95521/new/
https://reviews.llvm.org/D95521
More information about the llvm-commits
mailing list