split delinearization pass in 3 steps

Tue May 6 23:43:41 PDT 2014

On 07/05/2014 00:48, Sebastian Pop wrote:
> Tobias Grosser wrote:
>> >Hi Sebastian,
>> >
>> >it is good to see the result of our discussions on the mailing list.
>> >I already tested it with Polly and it works great. After spending a
>> >couple of hours integrating Polly into Julia (julialang.org), we can
>> >now nicely delinearize their linear algebra kernels. I will test it
>> >a little more with ublas and polybench, but I am rather confident
>> >about this change.
>> >
>> >Regarding the patch itself:
>> >
>>> > >diff --git a/include/llvm/Analysis/ScalarEvolutionExpressions.h b/include/llvm/Analysis/ScalarEvolutionExpressions.h
>>> > >index ed8c133..2c429e8 100644
>>> > >--- a/include/llvm/Analysis/ScalarEvolutionExpressions.h
>>> > >+++ b/include/llvm/Analysis/ScalarEvolutionExpressions.h
>>> > >@@ -352,8 +352,27 @@ namespace llvm {
>>> > >        return S->getSCEVType() == scAddRecExpr;
>>> > >      }
>>> > >
>>> > >-    /// Splits the SCEV into two vectors of SCEVs representing the subscripts
>>> > >-    /// and sizes of an array access. Returns the remainder of the
>>> > >+    /// First step of delinearization: find parametric terms in this
>>> > >+    /// SCEVAddRecExpr.
>> >
>> >1) Is it really necessary for SCEV* to be a SCEVAddRecExpr?
>
> Yes, I think I don't want to support generic delinearizations for now: how would
> you delinearize a generic SCEV, like SCEVMulExpr, SCEVAddExpr, etc.?

I am confused. Don't we already delinearize generic SCEVs that contain 
SCEVMulExpr, SCEVAddExpr, ... at least at some inner levels? Are you now 
saying if such an expression appears at the outer level, this causes 
problems?

>> >2) What about using the doxygen @param syntax?
> Not a single function in that file uses @param, so I won't start.  I also find
> doxygen markers hard to read.

OK.

>> >Does this mean we would also recurse into the step expression itself,
>> >instead of only the base expression? This seems incorrect for the
>> >algorithm we have in mind.
> We have to recurse on the AddRecExpr otherwise we won't get any multivariate
> SCEV's like {{{0,+,5}_i, +, 0}_j, +, 0}_k
>
> As we only collect AddRec->getStepRecurrence() it does not matter whether we
> recurse or not on the step of the AddRec.

What about a SCEV {0, +, {0, +, 1}_L2 }_L1?

>>> > >+    // Split the Denominator when it is a product.
>>> > >+    if (const SCEVMulExpr *T = dyn_cast<const SCEVMulExpr>(Denominator)) {
>>> > >+      const SCEV *Q, *R;
>>> > >+      *Quotient = Numerator;
>>> > >+      for (int I = 0, E = T->getNumOperands(); I < E; ++I) {
>>> > >+        divide(SE, *Quotient, T->getOperand(I), &Q, &R);
>> >
>> >C++11 foreach loop?
>> >
>> >The corresponding iterator range does not yet exist, but you could
>> >create it with llvm/ADT/iterator_range.h in a separate patch. As
>> >this patch adds plenty of operand loops, this may be worthwhile.
> Won't do.

I just added ops_range NArySCEV::operands()

This can now be written as:

for (const *SCEV Op : T->operands) {
   divide(SE, *Quotient, Op, &Q, &R);

>>> > >-private:
>>> > >-  ScalarEvolution &SE;
>>> > >-  const SCEV *GCD, *Remainder, *Zero, *One;
>>> > >-};
>>> > >+  // Remove all SCEVConstants.
>>> > >+  for (unsigned I = 0; I < Terms.size();)
>>> > >+    if (isa<SCEVConstant>(Terms[I]))
>>> > >+      Terms.erase(Terms.begin() + I);
>>> > >+    else
>>> > >+      ++I;
>> >
>> >This seems to work, but it looks a little confusing. What about:
>> >
>> >for (auto I = Terms.begin(), E = Terms.end(); I != E; I++)
>> >    if (isa<SCEVConstant>(Terms[I]))
>> >      I = Terms.erase(I);
> Not done. I don't know how to make your version work: it has several problems
> when I tried it out. I fixed at least two problems before I decided it's not
> worth the effort;-)

OK, that's fine, I can look into this after the patch is committed.

> Tobias Grosser wrote:
>> >On 02/05/2014 02:21, Tobias Grosser wrote:
>>> > >Hi Sebastian,
>>> > >
>>> > >it is good to see the result of our discussions on the mailing list. I
>>> > >already tested it with Polly and it works great. After spending a couple
>>> > >of hours integrating Polly into Julia (julialang.org), we can now nicely
>>> > >delinearize their linear algebra kernels. I will test it a little more
>>> > >with ublas and polybench, but I am rather confident about this change.
>> >
>> >I just tested your patch more intensively running it over all the
>> >polybench test cases. For almost all test cases, we correctly
>> >delinearize the accesses.
>> >
>> >There are three issues I run into:
>> >
>> >1) ASSERT: GCD does not evenly divide one of the terms
>> >======================================================
>> >
>> >I attached one test case that yields an assert when running:
>> >
>> >opt -delinearize -analyze GCD-does-not-evenly-divide-one-of-the-terms.ll
>> >
>> >Inst:  %1 = load double* %arrayidx70, align 8, !tbaa !1
>> >In Loop with Header: for.body60
>> >AddRec: {{((8 * undef) + %Ey),+,(8 * undef *
>> >undef)}<%for.cond55.preheader>,+,(8 * undef)}<%for.cond58.preheader>
> These are strange access functions: I should probably just bail out when the
> SCEV contains undefined values. Do you agree? Or are the undef's a side effect
> of bugpoint removing the initialization of variables?

The undef's may be a side effect of bugpoint removing something. Still, 
we should handle the case and just bail out.

I will then recheck if I still see the assert being triggered for the 
original full test case.

>> >2) SEGFAULT && infinite recursion
>> >=================================
>> >
>> >When running 'opt -delinearize -analyze segfault.ll', opt segfaults.
>> >The following change fixes the reason for this segfault:
>> >
>> >-    assert(Numerator && Denominator && *Quotient && *Remainder &&
>> >-           "Uninitialized SCEV");
>> >+    assert(Numerator && Denominator && "Uninitialized SCEV");
> I don't see this assert triggering on the testcase you attached.

>> >
>> >But unfortunately we run immediately in an infinite recursion (which
>> >causes another segfault:
> I fixed this recursion by checking that the following folding does not diverge:
>
>      // Quotient is (Numerator - Remainder) divided by Denominator.
>      const SCEV *Q, *R;
>      const SCEV *Diff = SE.getMinusSCEV(Numerator, Remainder);
>      if (sizeOfSCEV(Diff) > sizeOfSCEV(Numerator)) {
>        // This SCEV does not seem to simplify: fail the division here.
>        Quotient = Zero;
>        Remainder = Numerator;
>        return;
>      }

OK, that's fine.

>> >3. Array size in the index expression
>> >=====================================
>> >
>> >opt -delinearize -analyze /tmp/negative-offset.ll yields:
>> >
>> >Inst:  store double 2.000000e+00, double* %tmp16, align 8
>> >In Loop with Header: bb12
>> >AddRec: {(-8 + (8 * %arg1) + %arg2),+,(8 * %arg1)}<%bb12>
>> >failed to delinearize
>> >
>> >This one is interesting, as the original access is:
>> >
>> >double A[][N];
>> >for i:
>> >     A[i][N-1] = 2;
>> >
>> >I assume we could delinearize it to
>> >A[i+1][-1] (which is incorrect) and then translate this to:
>> >
>> >[x][y] -> [x][y]: y >= 0; [x][y] -> [x-1][y+N]: y < 0
>> >
>> >Let's keep this discussion out of the review. I have the feeling we
>> >can address it later.
> Let's discuss the runtime conditions tomorrow on the polly phone call.  I guess
> it is just a matter of interpreting the result of delinearization in the data
> dependence analysis.

OK.

> -    /// Splits the SCEV into two vectors of SCEVs representing the subscripts
> -    /// and sizes of an array access. Returns the remainder of the
> +    /// Collect parametric terms occurring in step expressions.
> +    void collectParametricTerms(ScalarEvolution &SE,
> +                                SmallVectorImpl<const SCEV *> &Terms) const;
> +
> +    /// Compute the array dimensions Sizes from the set of Terms extracted from
> +    /// the memory access function of this SCEVAddRecExpr.
> +    void findArrayDimensions(ScalarEvolution &SE,
> +                             SmallVectorImpl<const SCEV *> &Terms,
> +                             SmallVectorImpl<const SCEV *> &Sizes) const;
> +
> +    /// Return in Subscripts the access functions for each dimension in Sizes.
> +    const SCEV *
> +    computeAccessFunctions(ScalarEvolution &SE,
> +                           SmallVectorImpl<const SCEV *> &Subscripts,
> +                           SmallVectorImpl<const SCEV *> &Sizes) const;
> +
> +    /// Split this SCEVAddRecExpr into two vectors of SCEVs representing the
> +    /// subscripts and sizes of an array access. Returns the remainder of the
>       /// delinearization that is the offset start of the array.
> +    ///
> +    /// The delinearization is a 3 step process: the first two steps compute the
> +    /// sizes of each subscript and the third step computes the access functions
> +    /// for the delinearized array:
> +    ///
> +    /// 1. Find the terms in the step functions
> +    /// 2. Compute the array size
> +    /// 3. Compute the access function: divide the SCEV by the array size
> +    ///    starting with the innermost dimensions found in step 2. The Quotient
> +    ///    is the SCEV to be divided in the next step of the recursion. The
> +    ///    Remainder is the subscript of the innermost dimension. Loop over all
> +    ///    array dimensions computed in step 2.
> +    ///
> +    /// To compute a uniform array size for several memory accesses to the same
> +    /// object, one can collect in step 1 all the step terms for all the memory
> +    /// accesses, and compute in step 2 a unique array shape. This guarantees
> +    /// that the array shape will be the same across all memory accesses.
> +    ///
> +    /// FIXME: We could derive the result of steps 1 and 2 from a description of
> +    /// the array shape given in metadata.
> +    ///
> +    /// Example:
> +    ///
> +    /// A[][n][m]
> +    ///
> +    /// for i
> +    ///   for j
> +    ///     for k
> +    ///       A[j+k][2i][5i] =
> +    ///
> +    /// The initial SCEV:
> +    ///
> +    /// A[{{{0,+,2*m+5}_i, +, n*m}_j, +, n*m}_k]
> +    ///
> +    /// 1. Find the different terms in the step functions:
> +    /// -> [2*m, 5, n*m, n*m]
> +    ///
> +    /// 2. Compute the array size: sort and unique them
> +    /// -> [n*m, 2*m, 5]
> +    /// find the GCD of all the terms = 1
> +    /// divide by the GCD and erase constant terms
> +    /// -> [n*m, 2*m]
> +    /// GCD = m
> +    /// divide by GCD -> [n, 2]
> +    /// remove constant terms
> +    /// -> [n]
> +    /// size of the array is A[unknown][n][m]
> +    ///
> +    /// 3. Compute the access function
> +    /// a. Divide {{{0,+,2*m+5}_i, +, n*m}_j, +, n*m}_k by the innermost size m
> +    /// Quotient: {{{0,+,2}_i, +, n}_j, +, n}_k
> +    /// Remainder: {{{0,+,5}_i, +, 0}_j, +, 0}_k
> +    /// The remainder is the subscript of the innermost array dimension: [5i].
> +    ///
> +    /// b. Divide Quotient: {{{0,+,2}_i, +, n}_j, +, n}_k by next outer size n
> +    /// Quotient: {{{0,+,0}_i, +, 1}_j, +, 1}_k
> +    /// Remainder: {{{0,+,2}_i, +, 0}_j, +, 0}_k
> +    /// The Remainder is the subscript of the next array dimension: [2i].
> +    ///
> +    /// The subscript of the outermost dimension is the Quotient: [j+k].
> +    ///
> +    /// Overall, we have: A[][n][m], and the access function: A[j+k][2i][5i].
>       const SCEV *delinearize(ScalarEvolution &SE,
>                               SmallVectorImpl<const SCEV *> &Subscripts,
>                               SmallVectorImpl<const SCEV *> &Sizes) const;
> diff --git a/lib/Analysis/Delinearization.cpp b/lib/Analysis/Delinearization.cpp
> index 3623f30..1a58821 100644

> +/// Find parametric terms in this SCEVAddRecExpr.
> +void SCEVAddRecExpr::collectParametricTerms(
> +    ScalarEvolution &SE, SmallVectorImpl<const SCEV *> &Terms) const {
> +  SmallVector<const SCEV *, 4> Strides;
> +  SCEVCollectStrides StrideCollector(SE, Strides);
> +  visitAll(this, StrideCollector);
> +
> +  DEBUG({
> +      dbgs() << "Strides:\n";
> +      for (auto S : Strides)
> +        dbgs() << *S << "\n";
> +    });
> +
> +  for (auto S : Strides) {

According to

http://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto

you might want to use:

	for (const auto *S : Strides) {

for pointers.

I think after addressing the comments, we should commit this patch. I
think all the major concerns and wide spread stylistic changes have been 
addressed. Upcoming changes/improvements are probably best worked on 
in-tree.

Cheers,
Tobias