[llvm-commits] [PATCH] Multidimensional Array Index Delinearization Analysis

Tue Sep 25 23:01:14 PDT 2012

On Tue, 25 Sep 2012 17:02:58 -0500
Hal Finkel <hfinkel at anl.gov> wrote:

> I've attached an updated version of the pass. This version works
> better (several bugs have been fixed), and also includes a working
> bounds analysis.
> 
> I'll write up a description of the underlying algorithm and its
> rationale shortly.

In designing this pass, I wanted two things:

 1. I wanted to naturally handle arbitrary non-constant coefficients.
 2. I wanted to directly generate the size-and-multiindex decomposition
    along with any associated constraints.

The core algorithm works as follows: First, it collects polynomial
expressions used by GEP instructions that contain loop-dependent
variables. Each of these is analyzed (separately for now). The
polynomial is expanded -- there are limits to prevent generating really
large expressions -- and a list consisting of the loop-invariant parts
of each term as well as the loop-invariant parts of all pair-wise GCDs
of all terms is collected. From this list, all entries are removed that
have have divisors in the list. Then we take all combinations of the
remaining list entries, and for each combination: form the sum and
divide the original polynomial by the sum. We remember the 'best' such
combination that produced a non-trivial quotient and remainder. For
now, best essentially means 'smallest number of overall terms in the
quotient and remainder'. The index loop bounds are also checked and
the divisor can be rejected based on the bounds check as well. There may
be no trial division that produces a non-trivial result, and in that
case, no further decomposition is performed. Otherwise, the
decomposition continues recursively on the quotient (the divisor has
become the selected size and the remainder the index).

This is certainly what I would call a 'heuristic', but it seems to work
well so far. It has the benefit of applying to a large class of inputs;
and the disadvantage of incompleteness. In any case, it requires a lot
more testing ;)

Regarding Maslov's technique, I believe that my approach has
greater applicability. Primarily, Maslov's technique depends on
establishing an ordering among the coefficients. While this it
possible when the coefficients are all integer constants, and
sometimes possible symbolically, in general, this is not possible.
Also, as a practical matter, Maslov's algorithm, as
provided, returns dependence vectors, not an actual decomposition. I
believe that it could be modified to return the underlying
decomposition(s), but that would still leave the ordering constraints.

On the other hand, I believe that within its domain, Maslov's technique
is complete. My technique is, I imagine, not complete. It might turn
out to be valuable to have both implemented; we'd probably need to make
some enhancements to SE to really capture the ordering requirements
symbolically in the largest possible class of cases.

What do you think?

Thanks again,
Hal

> 
> In the mean time, I'd like to specifically discuss the bounds
> analysis. The delinearlization decomposition, generally, is valid
> only if the 'index' variable of each inner dimension is bounded in
> [0, 'size'). The pass now produces three outputs associated with this
> condition:
> 
>  - bool Confirmed - If this set to true if the pass could determine
>    that the index range is appropriately bounded within the loop
>    (possibly conditioned as discussed below). Unless the
>    -delinearize-use-oob-indices flag is passed, the pass will not
>    select a decomposition for which it can statically determine
>    that the index range is not appropriately bounded. Nevertheless, it
>    is not always possible to determine this statically, and when that
>    is the case, the pass produces a set of required positivity and
>    non-negativity conditions (if a comprehensive set of conditions
>    cannot be generated, then Confirmed must be false).
> 
>  - vector<SCEV *> PosCond, NNegCond - When Confirmed is true it is
>    possible that the bounds confirmation is conditional on some
>    supplementary conditions: a set of positivity conditions (scalar
>    values that must be positive at runtime) and non-negativity
>    condition (things that must be non-negative at runtime).
> 
> For example, the test case:
> ; void foo(long n, long m, long o, double A[n][m][o], long p, long q,
> long r) { ;
> ;   for (long i = 0; i < n; i++)
> ;     for (long j = 0; j < m; j++)
> ;       for (long k = 0; k < o; k++)
> ;         A[i+p][j+q][k+r] = 1.0;
> ; }
> 
> Now produces the output:
> Printing analysis 'Delinearization' for function 'foo':
> subscript: dim: 0, size: %o, index: {%r,+,1}<nw><%for.k>, confirmed: 1
>         positivity conditions:
>                 (1 + (-1 * %r))
>         non-negativity conditions:
>                 %r
> subscript: dim: 1, size: %m, index: {%q,+,1}<nw><%for.j>, confirmed: 1
>         positivity conditions:
>                 (1 + (-1 * %q))
>         non-negativity conditions:
>                 %q
> subscript: dim: 2, size: 1, index: {%p,+,1}<nw><%for.i>, confirmed: 1
> 
> In effect, this is saying that the decomposition is valid only if q
> and r are 0 -- otherwise the inner dimensions will overrun.
> 
> Thanks again,
> Hal
> 
> On Tue, 25 Sep 2012 08:36:38 -0500
> Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > On Tue, 25 Sep 2012 08:09:16 +0200
> > Tobias Grosser <tobias at grosser.es> wrote:
> > 
> > > Adding Sameer. He is also interested in that topic.
> > > 
> > > On 09/24/2012 06:57 AM, Hal Finkel wrote:
> > > > Hello,
> > > >
> > > > I've attached an initial version of an analysis pass for
> > > > delinearization of multidimensional array accesses.
> > > > Specifically, this means the following:
> > > 
> > > Hi Hal,
> > > 
> > > I did not fully review this patch yet, but I am already
> > > positively impressed. All my Polly test cases are already
> > > working, I will comment on the rest of this message soon.
> > 
> > Thanks! I talked with Nick on IRC yesterday and he helped me see
> > what was going on with the SE parts I had asked about. I'll post an
> > updated version later today.
> > 
> >  -Hal
> > 
> > > 
> > > >
> > > > Given some function that looks like:
> > > > void foo(long n, long m, long o, double A[n][m][o]) {
> > > >    for (long i = 0; i < n; i++)
> > > >      for (long j = 0; j < m; j++)
> > > >        for (long k = 0; k < o; k++)
> > > >          A[i][j][k] = 1.0;
> > > > }
> > > >
> > > > The GEP instruction associated with the array access depends on
> > > > the expression k + o*(j + m*i). From this expression, we
> > > > recover: size: o index: k
> > > >          size: m index: j
> > > >          size: 1 index: i
> > > >
> > > > In general, the index expression can be polynomials in both
> > > > loop-dependent and loop-invariant variables, the sizes must be
> > > > loop-invariant polynomials, and this pass attempts to handle
> > > > these more-complicated expressions. Furthermore, the current
> > > > implementation uses a specialized polynomial factoring
> > > > algorithm, and so does not depend strongly on the form of the
> > > > input expression.
> > > >
> > > > I would specifically like feedback on two issues (other
> > > > comments, of course, are also welcome)	:
> > > >
> > > > 1. Do I need the custom polynomial class, or can I, and if so,
> > > > should I, build the polynomial factoring on top of SCEV
> > > > directly?
> > > >
> > > > 2. I attempted to use SE to "confirm" the decomposition by
> > > > verifying that the expression isolated as the 'index' is never
> > > > greater than the 'size' within the loop. Unfortunately, this
> > > > does not work (the corresponding boolean is always false) and
> > > > I'm not sure why. I would appreciate some assistance. To see
> > > > what I've tried, see lines 1331-1333 of
> > > > lib/Analysis/Delinearization.cpp.
> > > >
> > > > To be clear: The attached code has not been widely tested. As of
> > > > right now, I've tested it only on the test cases in the patch.
> > > > If people like the approach, then I'll clean it up, do more
> > > > testing, and make it ready for an in-depth review.
> > > >
> > > > Additional test cases will also be helpful.
> > > >
> > > > Thanks again,
> > > > Hal
> > > >
> > > 
> > 
> > 
> > 
> 
> 
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory