[PATCH] D95521: [SCEV] Apply loop guards to zero modulo conditions

Mon Feb 1 09:02:30 PST 2021

fhahn added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll:67

-define dso_local void @assumeAlignedTC(i32* noalias nocapture %A, i32* %p) optsize {
 ; CHECK-LABEL: @assumeAlignedTC(
----------------
gilr wrote:
> fhahn wrote:
> > could you add a similar negative test, e.g. where the `and` does not strip the lowest bit?
> > 
> > Perhaps also add a test using `urem`?
> > could you add a similar negative test, e.g. where the and does not strip the lowest bit?
> 
> Will do.
> 
> > Perhaps also add a test using urem?
> 
> Ah, good catch!
> Trying VF=4, IC=3 exposed two issues:
> 
> - While `applyLoopGuards()` rewrites my divisible-by-12 TC correctly to (`12 * ...`), `getURemExpr(TC, 12)` doesn't fold to zero. This can perhaps be solved by using SCEVDivision instead of getURemExpr(). Will try that as an improvement over this patch.
> 
> - The test then hits the assert `"VF*UF must be a power of 2 when folding tail by masking"` later in LV. This isn't related to this patch (can be reproduced by forcing `optsize` and IC=3), so I'll upload a separate fix for it.
> 
> While applyLoopGuards() rewrites my divisible-by-12 TC correctly to (12 * ...), getURemExpr(TC, 12) doesn't fold to zero. This can perhaps be solved by using SCEVDivision instead of getURemExpr(). Will try that as an improvement over this patch.

That sounds like an independent improvement to SCEV.

> The test then hits the assert "VF*UF must be a power of 2 when folding tail by masking" later in LV. This isn't related to this patch (can be reproduced by forcing optsize and IC=3), so I'll upload a separate fix for it.

Sounds good, thanks!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95521/new/

https://reviews.llvm.org/D95521