[PATCH] D152366: [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Aug 6 13:08:14 PDT 2023


fhahn added inline comments.


================
Comment at: llvm/lib/Transforms/Utils/LoopUtils.cpp:1669
+        OuterExitCount->getType()->isIntegerTy()) {
+      const SCEV *NewHigh = cast<SCEVAddRecExpr>(High)->evaluateAtIteration(
+          OuterExitCount, *Exp.getSE());
----------------
david-arm wrote:
> david-arm wrote:
> > fhahn wrote:
> > > david-arm wrote:
> > > > fhahn wrote:
> > > > > Do we need to check if the AddRecs here don't wrap? If it would wrap, could NewHigh < NewLow?
> > > > That's an interesting point!
> > > > 
> > > > So I've looked into this and in order to even get to this point we must have already called evaluateAtIteration once already to get the original High value. This happens in `RuntimePointerChecking::insert`, which does something similar:
> > > > 
> > > >     const SCEV *Ex = PSE.getBackedgeTakenCount();
> > > > 
> > > >     ScStart = AR->getStart();
> > > >     ScEnd = AR->evaluateAtIteration(Ex, *SE);
> > > > 
> > > > I can't see any existing code that worries about ScEnd being < ScStart here and all I'm doing is performing the same evaluation for the outer loop backedge count. It's quite difficult to follow the whole paper trail of complex code in LoopAccessAnalysis, but it seems that AccessAnalysis::createCheckForAccess only worries about overflow if we failed a dependency check and retry the runtime memory checks.
> > > > 
> > > > Since this is under a flag and not enabled by default, would you be happy for now with me adding a TODO here to investigate whether wrapping really is a problem or not?
> > > Yeah it looks like this is handled a bit inconsistently at the moment. One other potential case that might be missed here is when the outer induction is decreasing. In that case I think the new high and lows would be swapped?
> > > 
> > > > Since this is under a flag and not enabled by default, would you be happy for now with me adding a TODO here to investigate whether wrapping really is a problem or not?
> > > 
> > > What is your plan in general to enable the flag by default? I think we want to avoid adding this behind a flag and not working towards enabling it by default in the near term;
> > So I would like to enable this by default in the near term after running a sufficient number of benchmarks on at least AArch64 targets. Having the flag in early though allows other people to test it or make use of it until it's enabled. Since it's under a flag, if we do enable it by default it's easy to turn off again should any problems arise.
> > 
> > With regards to decreasing outer induction variables I think I tried this and the hoisting doesn't happen, but I can double check.
> > 
> I believe I've now addressed this problem by checking if the stride is known non-negative. If we can't prove it's not negative then we need additional checks for positive strides. The extra check doesn't seem to hurt the x264 benchmark and I still see a nice improvement.
IIRC for regular runtime checks we create SCEV expressions for the minimum and maximum directly (something like `  Start = SE->getUMinExpr(Start, End);`). Could this be used here as well? 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152366/new/

https://reviews.llvm.org/D152366



More information about the llvm-commits mailing list