[LLVMdev] LICM and SCEV AA?

Sat Nov 2 05:45:04 PDT 2013

----- Original Message -----
> 
> 
> Sent from my iPhone
> 
> > On Nov 2, 2013, at 1:52 AM, Andrew Trick <atrick at apple.com> wrote:
> > 
> > 
> >> On Oct 31, 2013, at 6:06 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> 
> >> Hello,
> >> 
> >> We currently generate suboptimal code for this loop:
> >> 
> >> for (int i = 1; i < LEN; i++) {
> >> a[i] = a[0] + b[i];
> >> }
> >> 
> >> The largest problem is that we won't vectorize the loop because
> >> the loop vectorizer does not grok the independence of a[i] and
> >> a[0] (note that i starts at 1 in this loop). While we can
> >> probably fix the vectorizer to handle this, I don't think that is
> >> the right place for the fix, because this is really a LICM
> >> problem. LICM could move a[0] out of the loop, but it currently
> >> doesn't (which is pretty suboptimial in itself). The reason is
> >> that LICM does not use SE (either directly or indirectly), and
> >> BasicAA does not recognize that a[0] and a[i] will never overlap
> >> in this loop.
> >> 
> >> I see several possible solutions:
> >> 
> >> 1. Add yet-another hack to BasicAA to catch this case.
> >> 
> >> 2. Let LICM use SE directly. Instead of bailing out on hoisting a
> >> loop-invariant value in an alias set with non-invariant values,
> >> use SE to check independence with the other values in the alias
> >> set.
> >> 
> >> 3. Use ScalarEvolutionAliasAnalysis; scheduling SCEV AA prior to
> >> running LICM fixes this problem.
> >> 
> >> Using #3 seems like the simplest, and most natural, way to fix
> >> this issue. I recall someone tell me (maybe at the developers'
> >> meeting) that SCEV AA was 'fundamentally flawed', and now I'm
> >> wondering what was meant by that. The pass manager does not know
> >> how to reschedule it unless it is specifically requested, but if
> >> that's the entirety of the problem, then I think that we could
> >> manually schedule it in a few key places and get a lot of benefit
> >> (there is a particular use case for prior to LICM, at least).
> >> 
> >> FWIW, I don't think that adding an SCEV dependency before LICM
> >> would be particularly problematic: Currently, LICM lives in the
> >> same loop pass manager instance as LoopSimplify, LoopRotate and
> >> LoopUnswitch, all of which preserve SE.
> >> 
> >> Suggestions?
> > 
> > I would be scared to enable SCEV AA because it's not well tested
> > and updating SCEV across transforms can be problematic. LICM would
> > require another run of SCEV (bad for compile time). As for other
> > reasons it's fundamentally flawed, Dan G. may remember.
> > 
> > I would be tempted to go with #1 just to fix LICM, since LICM has
> > no need for generic dependence analysis.
> > 
> > Of course, the vectorizer already uses SCEV to determine array
> > element dependence. It should have a strong method for dependence
> > analysis. Why does it miss this case?
> 
> Like you said it *should*. The simple dependence analysis we
> currently have works by pair wise comparing (subtracting) the access
> SCEVS.
> 
> It currently only handles constant dependences (SCEVConstant)
> 
> The resulting distance from Hal's example is {4,+,4} ReadWrite and so
> should be safe.
> 
> We just don't handle this case yet.

The problem is that having the loop vectorizer handle this case, unless we teach it to also do LICM, would mean turning it into a scalar load and splat on every iteration. This is still not great. Having LICM fix this would be much better.

 -Hal

> 
> Best,
> Arnold
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory