[PATCH] D9151: Loop Versioning for LICM

Fri Jan 15 11:47:09 PST 2016

hfinkel added inline comments.

================
Comment at: lib/Transforms/IPO/PassManagerBuilder.cpp:349
@@ -342,1 +348,3 @@
+    MPM.add(createLICMPass());                  // Hoist loop invariants
+  }
   if (!DisableUnitAtATime && OptLevel > 1 && !PrepareForLTO) {
----------------
joker.eph wrote:
> hfinkel wrote:
> > joker.eph wrote:
> > > ashutosh.nema wrote:
> > > > joker.eph wrote:
> > > > > grosser wrote:
> > > > > > joker.eph wrote:
> > > > > > > grosser wrote:
> > > > > > > > joker.eph wrote:
> > > > > > > > > ashutosh.nema wrote:
> > > > > > > > > > joker.eph wrote:
> > > > > > > > > > > Why is is done here and not where LICM is usually done? Line 419?
> > > > > > > > > > > Did you try multiple placements? Do you have benchmarks results that drives this choice?
> > > > > > > > > > We want to run this when inining is over, because after that we may see more accurate aliasing.
> > > > > > > > > > That’s why we placed it after ‘createBarrierNoopPass’. If we do this earlier probably of no-alias 
> > > > > > > > > > aliasing assumptions in version loop, other optimizations may get some benefit. 
> > > > > > > > > > 
> > > > > > > > > > This can be schedule later as well but I don’t see any benefit.
> > > > > > > > > > 
> > > > > > > > > > We have observed good gains in our internal benchmarks, mostly based on customer workloads.
> > > > > > > > > I'm not sure I follow why you need the inliner to be fully complete over the whole module and not having processed the function on which you want to run LICM. What kind of aliasing will this avoid? Do you have an example?
> > > > > > > > > 
> > > > > > > > > Also, you are saying that you don't see any benefit of scheduling this later, can you elaborate with respect to the other run of LICM I pointed?
> > > > > > > > > 
> > > > > > > > > Also, I ask about benchmark with respect to the different possible placement for this pass, and you didn't answer this in particular.
> > > > > > > > > 
> > > > > > > > One reason to run this late is that too early versioning may prevent  both further inlining (due to increased code size) as well as versioning later a possibly larger loop.
> > > > > > > > 
> > > > > > > > Furthermore, to my understanding the inlining loop is indeed a canonicalization phase. We are planning to run Polly after the inliner loop for this very same reason.
> > > > > > > Fair for the inliner loop. But what about the already existing LICM that already runs a little bit later?
> > > > > > Are  you talking about the LICM pass within the inliner loop? It is to my understanding part of the canonicalization sequence that helps both to eliminate loops (in case all statements can be proven loop invariant) and which also helps SCEV by moving certain SCEVUnknowns further out of the tree. As the existing LICM does not add a large code size increase, it seems to be save to be run in the inliner loop.
> > > > > > 
> > > > > > (LICM is not a pure canonicalization hence it causes some troubles for Polly, but we are working on addressing them on our side).
> > > > > No, not within the inliner loop, see line 419 (or 411 before the patch)
> > > > We did tried multiple placements i.e. prior to inlining completion and post completion.
> > > > As this is an internal benchmark, we can’t disclose details. But with these placements we 
> > > > did not noticed any degrade in our regular testing. Reason for running just after inlining 
> > > > completion is other passes might gain from non-alias version of loop (As of now we do 
> > > > not have any case where other optimization getting benefits from no-alias version of loop).
> > > Well as long as it is not enabled by default in the pipeline, it does not matter. It is worth noting that this question may come back later though.
> > I don't think that making functions larger prior to inlining is something we can justify (i.e. I'm sure that will cause regressions). Also, as functions get inlined, it is likely to turn out that aliasing that we could not reason about in the function in isolation can be reasoned about in the context of a particular caller. Making the decision to version early will be problematic in that sense as well (it would be nice to think that, in such a case, we'd statically evaluate the runtime condition one way or the other, but I don't think that can happen now either).
> > 
> After talking on IRC, the answer to my original question about the LICM that runs later in the pipeline is that it runs after the vectorizer as part of post-vectorize cleanup. The current point of the multi-versioning LICM is different: it may help to allow better vectorization, so it makes sense to differentiate with the existing LICM.
Correct. One issue, however, is that unconditionally running LICM after the versioning pass is not ideal (in the compile-time sense), because LICM Is not super cheap.

Given that we broke LICM up into a set of utility functions (exposed in include/llvm/Transforms/Utils/LoopUtils.h), we should really call them from this pass only if we actually do anything.



Repository:
  rL LLVM

http://reviews.llvm.org/D9151