[PATCH] D36113: [Loop Vectorize] Vectorize Loops with Backward Dependence

Tue Aug 15 10:16:06 PDT 2017

hiraditya added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorizePred.cpp:187
+      if (MSSA->isLiveOnEntryDef(D))
+        Reordered |= checkDepAndReorder(St, Ld);
+      else if (Instruction *DefInst = dyn_cast<Instruction>(D)) {
----------------
dberlin wrote:
> mssimpso wrote:
> > dberlin wrote:
> > > DIVYA wrote:
> > > > hfinkel wrote:
> > > > > I'm somewhat concerned that a number of these cases are actually handling loop-invariant values, meaning that we're doing a suboptimal handling of something that LICM should handle more fully. The problem is that dealing with these after vectorization could be more difficult than before, and if we have a phase-ordering problem such that we're missing these cases, we might just end up with suboptimal code. Are you seeing these cases in practice?
> > > > > 
> > > > 
> > > > I think the loop invariant codes will be already hoisted outside the loop before this pass.The Defining access will be liveOnEntry for  the loads  and stores from pointer arguments .
> > > > For example in the below code,  the defining access for the load Instruction for a[i+1], is Live on Entry .However the load instruction is not loop invariant.
> > > > 
> > > > For the case
> > > > int  foo1(int n,int * restrict a, int * restrict b, int *restrict m){
> > > >   int i;
> > > >   for (i = 0; i < n; i++){
> > > >     a[i] = b[i];
> > > >     m[i] = a[i+1];
> > > >   }
> > > > }
> > > > 
> > > > 
> > > I'm surprised it can prove that it does not conflict with the store.
> > > The load instruction actually is loop invariant in the sense of whether a given loaded address can change during the loop, it just takes on different loop invariant values each iteration.
> > > 
> > > It is possible to hoist it out of this loop, it just requires making a different loop :)
> > > 
> > > In particular, this generates the same result:
> > > 
> > > ```
> > > int foo1(int n,int * restrict a, int * restrict b, int *restrict m){
> > > 
> > > int i;
> > > for (i = 0; i < n; i++){
> > >   m[i] = a[i+1];
> > > }
> > > for (i = 0; i < n; i++){
> > >   a[i] = b[i];
> > > }
> > > }
> > > ```
> > I wonder if our loop distribution pass (disabled by default) can handle this case?
> I also wonder if one of our passes could ever transform it to memcpy, since it's just a memcpy(m, a+1, n*sizeof(a))
@hfinkel
Since LICM does not hoist the invariant load, we do not see any performance degradation with this patch. Also, this patch handles other cases where the memory access is not live on entry. If LICM improves to handle invariant loads as you stated, or if we run loop distribution before vectorization, then we can disable that part. Other cases where the load is not loop invariant but can be reordered to enable vectorization can still be useful.

@mssimpso 
Maybe just improving LICM could achieve the distribution of this loop, and then running loop-idiom would convert to memcpy.
I think since LICM does not use MemorySSA, it is unable to establish the invariance of load.

https://reviews.llvm.org/D36113