[PATCH] Loop Rerolling Pass

Wed Oct 16 09:28:51 PDT 2013

----- Original Message -----
> > 
> > I thought about this, but it was not clear to me that reusing the
> > SLP-vectorizer's internals was the right thing to do. For one
> > thing, I wanted to be able to handle arbitrary function calls, and
> > other non-vectorizable instructions (in addition to the distray
> > case, the s353 loop in TSVC/LoopRerolling is also
> > non-vectorizable).
> 
> We can teach the SLP tree building to construct trees with non
> vectorizable functions (such as getc).  The tree building phase is
> just about finding isomorphic trees.
> 
> > Also, the relationship to the vectorization cost model seemed
> > unclear.

Right. The question is just is this worthwhile (would the changes necessary make the 'vectorization' part of the SLP vectorizer harder to understand and maintain)? That having been said, we should teach the SLP vectorizer to handle functions at some point anyway.

> 
> Yes, the cost model should be completely different.  The SLP API
> looks like this:
> 
> 01669     R.buildTree(Operands);
> 01670
> 01671     int Cost = R.getTreeCost();
> 01672
> 01674     if (Cost < CostThreshold) {
> 01676       R.vectorizeTree();
> 
> First, you create the tree. Next, you get the cost. And finally, you
> vectorize. If you decide to use it you would only use the first call
> for constructing the tree.

Okay, that makes sense. So I'd first pick candidate IVs as I do now, use the increment from those as 'vector lengths', and then try to build a tree from those.

>  
> 
> > Finally, I needed to confirm loop dependence ordering and full
> > coverage of the loop, both of which would have required
> > significant extension to what the SLP vectorizer provides.
> 
> Loop coverage is trivial. You just need to check that all of the
> instructions in the loop are inside one of the trees. There is
> already a map that contains all of the instructions. 

Good.

> I don’t
> understand the loop-dependence ordering problem. Can you explain the
> problem ?

Okay, a simple example:

for (int i = 0; i < 9; i +=3) {
  foo(i);
  foo(i+2);
  foo(i+1);
}

where foo(int) is some arbitrary function that cannot be speculatively executed. This loop cannot be rerolled because there is an ordering dependency between the calls to foo which forces them to execute 'out of order' (I can't reorder the calls to be foo(i); foo(i+1); foo(i+2);). The same thing can happen with load/stores: there could be a memory dependency between what appear to be different unrolled loop iterations that force the execution to happen, either intermixed, or out of order. In these cases we can't reroll.

Thanks again,
Hal

> 
> Thanks,
> Nadav

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory