split delinearization pass in 3 steps

Thu May 1 17:21:04 PDT 2014

Hi Sebastian,

it is good to see the result of our discussions on the mailing list. I 
already tested it with Polly and it works great. After spending a couple 
of hours integrating Polly into Julia (julialang.org), we can now nicely 
delinearize their linear algebra kernels. I will test it a little more 
with ublas and polybench, but I am rather confident about this change.

Regarding the patch itself:

> diff --git a/include/llvm/Analysis/ScalarEvolutionExpressions.h b/include/llvm/Analysis/ScalarEvolutionExpressions.h
> index ed8c133..2c429e8 100644
> --- a/include/llvm/Analysis/ScalarEvolutionExpressions.h
> +++ b/include/llvm/Analysis/ScalarEvolutionExpressions.h
> @@ -352,8 +352,27 @@ namespace llvm {
>         return S->getSCEVType() == scAddRecExpr;
>       }
>
> -    /// Splits the SCEV into two vectors of SCEVs representing the subscripts
> -    /// and sizes of an array access. Returns the remainder of the
> +    /// First step of delinearization: find parametric terms in this
> +    /// SCEVAddRecExpr.

1) Is it really necessary for SCEV* to be a SCEVAddRecExpr?

2) What about using the doxygen @param syntax?

@param Terms A vector in which the parametric terms will be returned.

Also, I am a little unsure about the 'First step of delinearization' 
comment. This is a little bit like a "come-from" comment. Instead of 
describing where this function is used, I would prefer to read what this 
function actually does.

Something like

@brief Collect parametric terms occurring in step expressions

We scan this SCEV for parametric terms that occur in the step 
expressions of SCEVAddRecExpr[s]. A parametric term is are 
termsconsisting of SCEVUnknowns/zext/[...]? In case the step
expression consists of

> +    void collectParametricTerms(ScalarEvolution &SE,
> +                                SmallVectorImpl<const SCEV *> &Terms) const;
> +
> +    /// Second step of delinearization: compute the array dimensions Sizes from
> +    /// the set of Terms extracted from the memory access function of this
> +    /// SCEVAddRecExpr.
> +    void findArrayDimensions(ScalarEvolution &SE,
> +                             SmallVectorImpl<const SCEV *> &Terms,
> +                             SmallVectorImpl<const SCEV *> &Sizes) const;
> +
> +    /// Third step of delinearization: return in Subscripts the access functions
> +    /// for each dimension in Sizes.
> +    const SCEV *
> +    computeAccessFunctions(ScalarEvolution &SE,
> +                           SmallVectorImpl<const SCEV *> &Subscripts,
> +                           SmallVectorImpl<const SCEV *> &Sizes) const;
> +
> +    /// Split this SCEVAddRecExpr into two vectors of SCEVs representing the
> +    /// subscripts and sizes of an array access. Returns the remainder of the
>       /// delinearization that is the offset start of the array.
>       const SCEV *delinearize(ScalarEvolution &SE,
>                               SmallVectorImpl<const SCEV *> &Subscripts,

This comment seems a little short. Everything a user of this function 
needs to know about this function should be written here. It seems 
larger parts of the comment of the delinearize() implementation could be 
moved here. Also, we should clearly state that this delinearization is 
just a guess based on certain assumptions that need to be run-time 
checked. To make this usable, we need to clearly state the assumptions 
that need to be checked.

> diff --git a/lib/Analysis/ScalarEvolution.cpp b/lib/Analysis/ScalarEvolution.cpp
> index d28aa58..584dc58 100644
> --- a/lib/Analysis/ScalarEvolution.cpp
> +++ b/lib/Analysis/ScalarEvolution.cpp
> @@ -6817,6 +6817,138 @@ const SCEV *SCEVAddRecExpr::getNumIterationsInRange(ConstantRange Range,
>     return SE.getCouldNotCompute();
>   }
>
> +/// The delinearization is a 3 step process: the first two steps compute the
> +/// sizes of each subscript and the third step computes the access functions for
> +/// the delinearized array:
 >
> +///
> +/// 1. Collect parametric terms
> +/// 2. Sort terms such that the smaller terms divide the larger terms
> +/// 3. Separate the SCEV in Quotient + Remainder using the largest term as divisor

80 columns.

> +///   -> Quotient is the subscript of the outer dimension
> +///   -> Remainder is the new SCEV
> +///   -> Separate again with next largest term
> +///   -> Loop until reaching 1
> +///
> +/// A unification process can be done on steps 1 and 2 when there exist several
> +/// access functions to the same memory object: that guarantees the array shape
> +/// will be the same across memory accesses. We could derive the result of steps
> +/// 1 and 2 from the shape given in metadata.

I would mark this as TODO or FIXME.

> +/// The previous SCEV->delinearize implementation performed steps 1,2,3 at the
> +/// same time. The main deficiency of the previous implementation is that it only
> +/// allows to look at the term in the step expression of the outermost AddRec
> +/// expression. This term may not have the information necessary to derive the
> +/// right delinearization:http://llvm.org/PR19336  is such a case.

I believe, the problems of the old implementation should only be 
mentioned in the commit message.

> +/// Example:
> +///
> +/// A[][n][m]
> +///
> +/// for i
> +///   for j
> +///     for k
> +///       A[j+k][2i][5i] =
> +///
> +/// The initial SCEV:
> +///
> +/// A[{{{0,+,2*m+5}_i, +, n*m}_j, +, n*m}_k]
> +///
> +/// 1. Find the different terms found in the step functions:
                                    ^^^^^  This is redundant

> +/// -> [2*m, 5 * 1, n*m, n*m]
> +///
> +/// 2. Sort them
> +/// -> [n*m, m, 1]
> +/// At this point we already guess the array size: A[][n][m]
> +///
> +/// 3.

            ^^^ Title missing here.

This would be even more understandable if the if the names of the 
individual points would correspond to the one you used at the beginning.

> +/// a. Divide SCEV: {{{0,+,2*m+5}_i, +, n*m}_j, +, n*m}_k  by largest term 'n*m'
> +/// Quotient: {{{0,+,0}_i, +, 1}_j, +, 1}_k
> +/// Remainder: {{{0,+,2*m+5}_i, +, 0}_j, +, 0}_k
> +/// The quotient is the subscript of the outermost array dimension.
> +///
> +/// b. Divide Remainder: {{{0,+,2*m+5}_i, +, 0}_j, +, 0}_k by next largest term 'm'

80 columns

> +/// Quotient: {{{0,+,2}_i, +, 0}_j, +, 0}_k
> +/// Remainder: {{{0,+,5}_i, +, 0}_j, +, 0}_k
> +/// The quotient is the subscript of the next array dimension.
> +///
> +/// As the only term remaining is '1', the remainder is the subscript of the
> +/// innermost array expression.
> +///
> +/// Overall, we have:
> +///
> +/// n * m: {{{0,+,0}_i, +, 1}_j, +, 1}_k
> +/// +
> +/// m: {{{0,+,2}_i, +, 0}_j, +, 0}_k
> +/// +
> +/// 1 * {{{0,+,5}_i, +, 0}_j, +, 0}_k
> +///
> +/// This leads to the access function: A[j+k][2i][5i].

Also, this whole comment is hanging a little in the middle of nothing.
To me, it would fit best as an implementation not on top of the 
delinearize() function after the original user interface description
has been pulled in the header.

> +namespace {
> +// Collect all steps of SCEV expressions.
> +struct SCEVCollectStrides {
> +  ScalarEvolution &SE;
> +  SmallVectorImpl<const SCEV *> &Strides;
> +
> +  SCEVCollectStrides(ScalarEvolution &SE, SmallVectorImpl<const SCEV *> &S)
> +      : SE(SE), Strides(S) {}
> +
> +  bool follow(const SCEV *S) {
> +    if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S))
> +      Strides.push_back(AR->getStepRecurrence(SE));
> +    return true;

Does this mean we would also recurse into the step expression itself,
instead of only the base expression? This seems incorrect for the 
algorithm we have in mind.

> +  }
> +  bool isDone() const { return false; }
> +};
> +
> +// Collect all SCEVUnknown and SCEVMulExpr expressions.
> +struct SCEVCollectTerms {
> +  SmallVectorImpl<const SCEV *> &Terms;
> +
> +  SCEVCollectTerms(SmallVectorImpl<const SCEV *> &T)
> +      : Terms(T) {}
> +
> +  bool follow(const SCEV *S) {
> +    if (isa<SCEVUnknown>(S) || isa<SCEVConstant>(S) || isa<SCEVMulExpr>(S)) {
> +      Terms.push_back(S);
> +
> +      // Stop recursion: once we collected a term, do not walk its operands.
> +      return false;
> +    }
> +
> +    // Keep looking.
> +    return true;
> +  }
> +  bool isDone() const { return false; }
> +};
> +}
> +
> +/// First step of delinearization: find parametric terms in this SCEVAddRecExpr.
> +void SCEVAddRecExpr::collectParametricTerms(
> +    ScalarEvolution &SE, SmallVectorImpl<const SCEV *> &Terms) const {
> +  SmallVector<const SCEV *, 4> Strides;
> +  SCEVCollectStrides StrideCollector(SE, Strides);
> +  visitAll(this, StrideCollector);
> +
> +#ifndef NDEBUG
> +  DEBUG(dbgs() << "Strides:\n");
> +  for (unsigned I = 0; I < Strides.size(); ++I)
> +    DEBUG(dbgs() << *Strides[I] << "\n");

What about:
   DEBUG({ 

     dbgs() << "Strides:\n";
     for (auto Stride: Strides)
       dbgs() << *Stride << "\n";
   });

This also removes the need for the NDEBUG check.

> +#endif
> +
> +  for (unsigned int i = 0; i < Strides.size(); i++) {
> +    SCEVCollectTerms TermCollector(Terms);
> +    visitAll(Strides[i], TermCollector);
> +  }

C++11 foreach loop?

> +
> +#ifndef NDEBUG
> +  DEBUG(dbgs() << "Terms:\n");
> +  for (unsigned I = 0; I < Terms.size(); ++I)
> +    DEBUG(dbgs() << *Terms[I] << "\n");
> +#endif

C++11 foreach loop? DEBUG({})?

This also removes the need for the NDEBUG check.

>   static const APInt srem(const SCEVConstant *C1, const SCEVConstant *C2) {
>     APInt A = C1->getValue()->getValue();
>     APInt B = C2->getValue()->getValue();
> @@ -6846,358 +6978,412 @@ static const APInt sdiv(const SCEVConstant *C1, const SCEVConstant *C2) {
>   }
>
>   namespace {
> -struct SCEVGCD : public SCEVVisitor<SCEVGCD, const SCEV *> {
> +
> +struct SCEVDivision : public SCEVVisitor<SCEVDivision, void> {
>   public:
> -  // Pattern match Step into Start. When Step is a multiply expression, find
> -  // the largest subexpression of Step that appears in Start. When Start is an
> -  // add expression, try to match Step in the subexpressions of Start, non
> -  // matching subexpressions are returned under Remainder.
> -  static const SCEV *findGCD(ScalarEvolution &SE, const SCEV *Start,
> -                             const SCEV *Step, const SCEV **Remainder) {
> -    assert(Remainder && "Remainder should not be NULL");
> -    SCEVGCD R(SE, Step, SE.getConstant(Step->getType(), 0));
> -    const SCEV *Res = R.visit(Start);
> -    *Remainder = R.Remainder;
> -    return Res;
> -  }
> -
> -  SCEVGCD(ScalarEvolution &S, const SCEV *G, const SCEV *R)
> -      : SE(S), GCD(G), Remainder(R) {
> -    Zero = SE.getConstant(GCD->getType(), 0);
> -    One = SE.getConstant(GCD->getType(), 1);
> -  }
> -
> -  const SCEV *visitConstant(const SCEVConstant *Constant) {
> -    if (GCD == Constant || Constant == Zero)
> -      return GCD;
> -
> -    if (const SCEVConstant *CGCD = dyn_cast<SCEVConstant>(GCD)) {
> -      const SCEV *Res = SE.getConstant(gcd(Constant, CGCD));
> -      if (Res != One)
> -        return Res;
> -
> -      Remainder = SE.getConstant(srem(Constant, CGCD));
> -      Constant = cast<SCEVConstant>(SE.getMinusSCEV(Constant, Remainder));
> -      Res = SE.getConstant(gcd(Constant, CGCD));
> -      return Res;
> +  // Computes the Quotient and Remainder of the division of Numerator by
> +  // Denominator.
> +  static void divide(ScalarEvolution &SE, const SCEV *Numerator,
> +                     const SCEV *Denominator, const SCEV **Quotient,
> +                     const SCEV **Remainder) {

You could think of returning a std::tuple, which can then be extracted
by std::tie. This may make it more clear that we return Quotient and 
Remainder. (I am not sure if this is really nicer. I mentioned it to
at least make sure you are aware of this option).

> +    assert(Numerator && Denominator && *Quotient && *Remainder &&
> +           "Uninitialized SCEV");
> +
> +    SCEVDivision D(SE, Numerator, Denominator);
> +
> +    // Check for the trivial case here to avoid having to check for it in the
> +    // rest of the code.
> +    if (Numerator == Denominator) {
> +      *Quotient = D.One;
> +      *Remainder = D.Zero;
> +      return;
>       }
>
> -    // When GCD is not a constant, it could be that the GCD is an Add, Mul,
> -    // AddRec, etc., in which case we want to find out how many times the
> -    // Constant divides the GCD: we then return that as the new GCD.
> -    const SCEV *Rem = Zero;
> -    const SCEV *Res = findGCD(SE, GCD, Constant, &Rem);
> -
> -    if (Res == One || Rem != Zero) {
> -      Remainder = Constant;
> -      return One;
> +    if (Numerator == D.Zero) {
> +      *Quotient = D.Zero;
> +      *Remainder = D.Zero;
> +      return;
>       }
>
> -    assert(isa<SCEVConstant>(Res) && "Res should be a constant");
> -    Remainder = SE.getConstant(srem(Constant, cast<SCEVConstant>(Res)));
> -    return Res;
> -  }
> -
> -  const SCEV *visitTruncateExpr(const SCEVTruncateExpr *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> -  }
> -
> -  const SCEV *visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> -  }
> +    // Split the Denominator when it is a product.
> +    if (const SCEVMulExpr *T = dyn_cast<const SCEVMulExpr>(Denominator)) {
> +      const SCEV *Q, *R;
> +      *Quotient = Numerator;
> +      for (int I = 0, E = T->getNumOperands(); I < E; ++I) {
> +        divide(SE, *Quotient, T->getOperand(I), &Q, &R);

C++11 foreach loop?

The corresponding iterator range does not yet exist, but you could 
create it with llvm/ADT/iterator_range.h in a separate patch. As this 
patch adds plenty of operand loops, this may be worthwhile.

> +        *Quotient = Q;
> +
> +        // Bail out when the Numerator is not divisible by one of the terms of
> +        // the Denominator.
> +        if (R != D.Zero) {
> +          *Quotient = D.Zero;
> +          *Remainder = Numerator;
> +          return;
> +        }
> +      }
> +      *Remainder = D.Zero;
> +      return;
> +    }
>
> -  const SCEV *visitSignExtendExpr(const SCEVSignExtendExpr *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> +    D.visit(Numerator);
> +    *Quotient = D.Quotient;
> +    *Remainder = D.Remainder;
> +  }
> +
> +  SCEVDivision(ScalarEvolution &S, const SCEV *Numerator, const SCEV *Denominator)

80 columns.

> +      : SE(S), Denominator(Denominator) {
> +    Zero = SE.getConstant(Denominator->getType(), 0);
> +    One = SE.getConstant(Denominator->getType(), 1);
> +
> +    // By default, we don't know how to divide Expr by Denominator.
> +    // Providing the default here simplifies the rest of the code.
> +    Quotient = Zero;
> +    Remainder = Numerator;
> +  }
> +
> +  // Except in the trivial case described above, we do not know how to divide
> +  // Expr by Denominator for the following functions with empty implementation.
> +  void visitTruncateExpr(const SCEVTruncateExpr *Numerator) {}
> +  void visitZeroExtendExpr(const SCEVZeroExtendExpr *Numerator) {}
> +  void visitSignExtendExpr(const SCEVSignExtendExpr *Numerator) {}
> +  void visitUDivExpr(const SCEVUDivExpr *Numerator) {}
> +  void visitSMaxExpr(const SCEVSMaxExpr *Numerator) {}
> +  void visitUMaxExpr(const SCEVUMaxExpr *Numerator) {}
> +  void visitUnknown(const SCEVUnknown *Numerator) {}
> +  void visitCouldNotCompute(const SCEVCouldNotCompute *Numerator) {}
> +
> +  void visitConstant(const SCEVConstant *Numerator) {
> +    if (const SCEVConstant *D = dyn_cast<SCEVConstant>(Denominator)) {
> +      Quotient = SE.getConstant(sdiv(Numerator, D));
> +      Remainder = SE.getConstant(srem(Numerator, D));
> +      return;
> +    }
>     }
>
> -  const SCEV *visitAddExpr(const SCEVAddExpr *Expr) {
> -    if (GCD == Expr)
> -      return GCD;
> -
> -    for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) {
> -      const SCEV *Rem = Zero;
> -      const SCEV *Res = findGCD(SE, Expr->getOperand(e - 1 - i), GCD, &Rem);
> +  void visitAddRecExpr(const SCEVAddRecExpr *Numerator) {
> +    const SCEV *StartQ, *StartR, *StepQ, *StepR;
> +    assert(Numerator->isAffine() && "Numerator should be affine");
> +    divide(SE, Numerator->getStart(), Denominator, &StartQ, &StartR);
> +    divide(SE, Numerator->getStepRecurrence(SE), Denominator, &StepQ, &StepR);
> +    Quotient = SE.getAddRecExpr(StartQ, StepQ, Numerator->getLoop(),
> +                                Numerator->getNoWrapFlags());
> +    Remainder = SE.getAddRecExpr(StartR, StepR, Numerator->getLoop(),
> +                                 Numerator->getNoWrapFlags());
> +  }
> +
> +  void visitAddExpr(const SCEVAddExpr *Numerator) {
> +    SmallVector<const SCEV *, 2> Qs, Rs;
> +    for (int I = 0, E = Numerator->getNumOperands(); I < E; ++I) {

C++11 range loop?

> +      const SCEV *Q, *R;
> +      divide(SE, Numerator->getOperand(I), Denominator, &Q, &R);
> +      Qs.push_back(Q);
> +      Rs.push_back(R);
> +    }
>
> -      // FIXME: There may be ambiguous situations: for instance,
> -      // GCD(-4 + (3 * %m), 2 * %m) where 2 divides -4 and %m divides (3 * %m).
> -      // The order in which the AddExpr is traversed computes a different GCD
> -      // and Remainder.
> -      if (Res != One)
> -        GCD = Res;
> -      if (Rem != Zero)
> -        Remainder = SE.getAddExpr(Remainder, Rem);
> +    if (Qs.size() == 1) {
> +      Quotient = Qs[0];
> +      Remainder = Rs[0];
> +      return;
>       }
>
> -    return GCD;
> +    Quotient = SE.getAddExpr(Qs);
> +    Remainder = SE.getAddExpr(Rs);
>     }
>
> -  const SCEV *visitMulExpr(const SCEVMulExpr *Expr) {
> -    if (GCD == Expr)
> -      return GCD;
> +  void visitMulExpr(const SCEVMulExpr *Numerator) {
> +    SmallVector<const SCEV *, 2> Qs;
>
> -    for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) {
> -      if (Expr->getOperand(i) == GCD)
> -        return GCD;
> -    }
> +    bool FoundDenominatorTerm = false;
> +    for (int I = 0, E = Numerator->getNumOperands(); I < E; ++I) {

C++11 range loop?

> +      if (FoundDenominatorTerm) {
> +        Qs.push_back(Numerator->getOperand(I));
> +        continue;
> +      }
>
> -    // If we have not returned yet, it means that GCD is not part of Expr.
> -    const SCEV *PartialGCD = One;
> -    for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) {

C++11 range loop?

> -      const SCEV *Rem = Zero;
> -      const SCEV *Res = findGCD(SE, Expr->getOperand(i), GCD, &Rem);
> -      if (Rem != Zero)
> -        // GCD does not divide Expr->getOperand(i).
> +      // Check whether Denominator divides one of the product operands.
> +      const SCEV *Q, *R;
> +      divide(SE, Numerator->getOperand(I), Denominator, &Q, &R);
> +      if (R != Zero) {
> +        Qs.push_back(Numerator->getOperand(I));
>           continue;
> +      }
> +      FoundDenominatorTerm = true;
> +      Qs.push_back(Q);
> +    }
>
> -      if (Res == GCD)
> -        return GCD;
> -      PartialGCD = SE.getMulExpr(PartialGCD, Res);
> -      if (PartialGCD == GCD)
> -        return GCD;
> +    if (FoundDenominatorTerm) {
> +      Remainder = Zero;
> +      if (Qs.size() == 1)
> +        Quotient = Qs[0];
> +      else
> +        Quotient = SE.getMulExpr(Qs);
> +      return;
>       }
>
> -    if (PartialGCD != One)
> -      return PartialGCD;
> -
> -    // Failed to find a PartialGCD: set the Remainder to the full expression,
> -    // and return the GCD.
> -    Remainder = Expr;
> -    const SCEVMulExpr *Mul = dyn_cast<SCEVMulExpr>(GCD);
> -    if (!Mul)
> -      return GCD;
> -
> -    // When the GCD is a multiply expression, try to decompose it:
> -    // this occurs when Step does not divide the Start expression
> -    // as in: {(-4 + (3 * %m)),+,(2 * %m)}
> -    for (int i = 0, e = Mul->getNumOperands(); i < e; ++i) {
> -      const SCEV *Rem = Zero;
> -      const SCEV *Res = findGCD(SE, Expr, Mul->getOperand(i), &Rem);
> -      if (Rem == Zero) {
> -        Remainder = Rem;
> -        return Res;
> -      }
> +    if (!isa<SCEVUnknown>(Denominator)) {
> +      Quotient = Zero;
> +      Remainder = Numerator;
> +      return;
>       }
>
> -    return GCD;
> +    // The Remainder is obtained by replacing Denominator by 0 in Numerator.
> +    ValueToValueMap RewriteMap;
> +    RewriteMap[cast<SCEVUnknown>(Denominator)->getValue()] =
> +        cast<SCEVConstant>(Zero)->getValue();
> +    Remainder = SCEVParameterRewriter::rewrite(Numerator, SE, RewriteMap, true);
> +
> +    // Quotient is (Numerator - Remainder) divided by Denominator.
> +    const SCEV *Q, *R;
> +    divide(SE, SE.getMinusSCEV(Numerator, Remainder), Denominator, &Q, &R);
> +    assert(R == Zero &&
> +           "(Numerator - Remainder) should evenly divide Denominator");
> +    Quotient = Q;
>     }
>
> -  const SCEV *visitUDivExpr(const SCEVUDivExpr *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> +private:
> +  ScalarEvolution &SE;
> +  const SCEV *Denominator, *Quotient, *Remainder, *Zero, *One;
> +};
> +}
> +
> +// Find the Greatest Common Divisor of A and B.
> +static const SCEV *
> +findGCD(ScalarEvolution &SE, const SCEV *A, const SCEV *B) {
> +
> +  if (const SCEVConstant *CA = dyn_cast<SCEVConstant>(A))
> +    if (const SCEVConstant *CB = dyn_cast<SCEVConstant>(B))
> +      return SE.getConstant(gcd(CA, CB));
> +
> +  const SCEV *One = SE.getConstant(A->getType(), 1);
> +  if (isa<SCEVConstant>(A) && isa<SCEVUnknown>(B))
> +    return One;
> +  if (isa<SCEVUnknown>(A) && isa<SCEVConstant>(B))
> +    return One;
> +
> +  const SCEV *Q, *R;
> +  if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(A)) {
> +    SmallVector<const SCEV *, 2> Qs;
> +    for (int I = 0, E = M->getNumOperands(); I < E; ++I)

C++11 range loop?

> +      Qs.push_back(findGCD(SE, M->getOperand(I), B));
> +    return SE.getMulExpr(Qs);
> +  }
> +  if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(B)) {
> +    SmallVector<const SCEV *, 2> Qs;
> +    for (int I = 0, E = M->getNumOperands(); I < E; ++I)

C++11 range loop?

> +      Qs.push_back(findGCD(SE, A, M->getOperand(I)));
> +    return SE.getMulExpr(Qs);
>     }
>
> -  const SCEV *visitAddRecExpr(const SCEVAddRecExpr *Expr) {
> -    if (GCD == Expr)
> -      return GCD;
> +  const SCEV *Zero = SE.getConstant(A->getType(), 0);
> +  SCEVDivision::divide(SE, A, B, &Q, &R);
> +  if (R == Zero)
> +    return B;
>
> -    if (!Expr->isAffine()) {
> -      Remainder = Expr;
> -      return GCD;
> -    }
> +  SCEVDivision::divide(SE, B, A, &Q, &R);
> +  if (R == Zero)
> +    return A;
>
> -    const SCEV *Rem = Zero;
> -    const SCEV *Res = findGCD(SE, Expr->getOperand(0), GCD, &Rem);
> -    if (Res == One || Res->isAllOnesValue()) {
> -      Remainder = Expr;
> -      return GCD;
> -    }
> +  return One;
> +}
>
> -    if (Rem != Zero)
> -      Remainder = SE.getAddExpr(Remainder, Rem);
> +// Find the Greatest Common Divisor of all the SCEVs in Terms.
> +static const SCEV *
> +findGCD(ScalarEvolution &SE, SmallVectorImpl<const SCEV *> &Terms) {
> +  assert(Terms.size() > 0 && "Terms vector is empty");
>
> -    Rem = Zero;
> -    Res = findGCD(SE, Expr->getOperand(1), Res, &Rem);
> -    if (Rem != Zero || Res == One || Res->isAllOnesValue()) {
> -      Remainder = Expr;
> -      return GCD;
> -    }
> +  const SCEV *GCD = Terms[0];
> +  for (unsigned I = 1; I < Terms.size(); I++)
> +    GCD = findGCD(SE, GCD, Terms[I]);
>
> -    return Res;
> -  }
> +  return GCD;
> +}
>
> -  const SCEV *visitSMaxExpr(const SCEVSMaxExpr *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> -  }
> +static void findArrayDimensionsRec(ScalarEvolution &SE,
> +                                   SmallVectorImpl<const SCEV *> &Terms,
> +                                   SmallVectorImpl<const SCEV *> &Sizes,
> +                                   const SCEV *Zero, const SCEV *One) {
> +  // The GCD of all Terms is the dimension of the innermost dimension.
> +  const SCEV *GCD = findGCD(SE, Terms);
>
> -  const SCEV *visitUMaxExpr(const SCEVUMaxExpr *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> -  }
> +  // End of recursion.
> +  if (Terms.size() == 1) {
> +    if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(GCD)) {
> +      SmallVector<const SCEV *, 2> Qs;
> +      for (int I = 0, E = M->getNumOperands(); I < E; ++I)

C++11 range loop?

> +        if (!isa<SCEVConstant>(M->getOperand(I)))
> +          Qs.push_back(M->getOperand(I));
> +
> +      GCD = SE.getMulExpr(Qs);
> +    }
>
> -  const SCEV *visitUnknown(const SCEVUnknown *Expr) {
> -    if (GCD != Expr)
> -      Remainder = Expr;
> -    return GCD;
> +    Sizes.push_back(GCD);
> +    return;
>     }
>
> -  const SCEV *visitCouldNotCompute(const SCEVCouldNotCompute *Expr) {
> -    return One;
> +  for (unsigned I = 0; I < Terms.size(); ++I) {

C++11 range loop?

> +    // Normalize the terms before the next call to findArrayDimensionsRec.
> +    const SCEV *Q, *R;
> +    SCEVDivision::divide(SE, Terms[I], GCD, &Q, &R);
> +    assert(R == Zero && "GCD does not evenly divide one of the terms");
> +    Terms[I] = Q;
>     }
>
> -private:
> -  ScalarEvolution &SE;
> -  const SCEV *GCD, *Remainder, *Zero, *One;
> -};
> +  // Remove all SCEVConstants.
> +  for (unsigned I = 0; I < Terms.size();)
> +    if (isa<SCEVConstant>(Terms[I]))
> +      Terms.erase(Terms.begin() + I);
> +    else
> +      ++I;

This seems to work, but it looks a little confusing. What about:

for (auto I = Terms.begin(), E = Terms.end(); I != E; I++)
    if (isa<SCEVConstant>(Terms[I]))
      I = Terms.erase(I);

> -struct SCEVDivision : public SCEVVisitor<SCEVDivision, const SCEV *> {
> -public:
> -  // Remove from Start all multiples of Step.
> -  static const SCEV *divide(ScalarEvolution &SE, const SCEV *Start,
> -                            const SCEV *Step) {
> -    SCEVDivision D(SE, Step);
> -    const SCEV *Rem = D.Zero;
> -    (void)Rem;
> -    // The division is guaranteed to succeed: Step should divide Start with no
> -    // remainder.
> -    assert(Step == SCEVGCD::findGCD(SE, Start, Step, &Rem) && Rem == D.Zero &&
> -           "Step should divide Start with no remainder.");
> -    return D.visit(Start);
> -  }
> +  if (Terms.size() > 0)
> +    findArrayDimensionsRec(SE, Terms, Sizes, Zero, One);
> +  Sizes.push_back(GCD);
> +}
> +
> +namespace {
> +struct FindParameter {
> +  bool FoundParameter;
> +  FindParameter() : FoundParameter(false) {}
>
> -  SCEVDivision(ScalarEvolution &S, const SCEV *G) : SE(S), GCD(G) {
> -    Zero = SE.getConstant(GCD->getType(), 0);
> -    One = SE.getConstant(GCD->getType(), 1);
> +  bool follow(const SCEV *S) {
> +    if (isa<SCEVUnknown>(S)) {
> +      FoundParameter = true;
> +      // Stop recursion: we found a parameter.
> +      return false;
> +    }
> +    // Keep looking.
> +    return true;
> +  }
> +  bool isDone() const {
> +    // Stop recursion if we have found a parameter.
> +    return FoundParameter;
>     }
> +};
> +}
>
> -  const SCEV *visitConstant(const SCEVConstant *Constant) {
> -    if (GCD == Constant)
> -      return One;
> +// Returns true when S contains at least a SCEVUnknown parameter.
> +static inline bool
> +containsParameters(const SCEV *S) {
> +  FindParameter F;
> +  SCEVTraversal<FindParameter> ST(F);
> +  ST.visitAll(S);
>
> -    if (const SCEVConstant *CGCD = dyn_cast<SCEVConstant>(GCD))
> -      return SE.getConstant(sdiv(Constant, CGCD));
> -    return Constant;
> -  }
> +  return F.FoundParameter;
> +}
>
> -  const SCEV *visitTruncateExpr(const SCEVTruncateExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> -  }
> +// Returns true when one of the SCEVs of Terms contains a SCEVUnknown parameter.
> +static inline bool
> +containsParameters(SmallVectorImpl<const SCEV *> &Terms) {
> +  for (unsigned I = 0; I < Terms.size(); I++)
> +    if (containsParameters(Terms[I]))
> +      return true;
> +  return false;
> +}
>
> -  const SCEV *visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> -  }
> +// Return the number of product terms in S.
> +static inline int numberOfTerms(const SCEV *S) {
> +  if (const SCEVMulExpr *Expr = dyn_cast<SCEVMulExpr>(S))
> +    return Expr->getNumOperands();
> +  return 1;
> +}
>
> -  const SCEV *visitSignExtendExpr(const SCEVSignExtendExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> -  }
> +/// Second step of delinearization: compute the array dimensions Sizes from the
> +/// set of Terms extracted from the memory access function of this SCEVAddRec.
> +void SCEVAddRecExpr::findArrayDimensions(
> +    ScalarEvolution &SE, SmallVectorImpl<const SCEV *> &Terms,
> +    SmallVectorImpl<const SCEV *> &Sizes) const {
>
> -  const SCEV *visitAddExpr(const SCEVAddExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> +  if (Terms.size() < 2)
> +    return;
>
> -    SmallVector<const SCEV *, 2> Operands;
> -    for (int i = 0, e = Expr->getNumOperands(); i < e; ++i)
> -      Operands.push_back(divide(SE, Expr->getOperand(i), GCD));
> +  // Early return when Terms do not contain parameters: we do not delinearize
> +  // non parametric SCEVs.
> +  if (!containsParameters(Terms))
> +    return;
>
> -    if (Operands.size() == 1)
> -      return Operands[0];
> -    return SE.getAddExpr(Operands);
> -  }
> +#ifndef NDEBUG
> +  DEBUG(dbgs() << "Terms:\n");
> +  for (unsigned I = 0; I < Terms.size(); ++I)
> +    DEBUG(dbgs() << *Terms[I] << "\n");
> +#endif

C++11 range loop? DEBUG({})?

This also removes the need for the NDEBUG check.

> -  const SCEV *visitMulExpr(const SCEVMulExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> +  // Remove duplicates.
> +  std::sort(Terms.begin(), Terms.end());
> +  Terms.erase(std::unique(Terms.begin(), Terms.end()), Terms.end());
>
> -    bool FoundGCDTerm = false;
> -    for (int i = 0, e = Expr->getNumOperands(); i < e; ++i)
> -      if (Expr->getOperand(i) == GCD)
> -        FoundGCDTerm = true;
> +  // Put larger terms first.
> +  std::sort(Terms.begin(), Terms.end(), [](const SCEV *LHS, const SCEV *RHS) {
> +    return numberOfTerms(LHS) > numberOfTerms(RHS);
> +  });
>
> -    SmallVector<const SCEV *, 2> Operands;
> -    if (FoundGCDTerm) {
> -      FoundGCDTerm = false;
> -      for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) {
> -        if (FoundGCDTerm)
> -          Operands.push_back(Expr->getOperand(i));
> -        else if (Expr->getOperand(i) == GCD)
> -          FoundGCDTerm = true;
> -        else
> -          Operands.push_back(Expr->getOperand(i));
> -      }
> -    } else {
> -      const SCEV *PartialGCD = One;
> -      for (int i = 0, e = Expr->getNumOperands(); i < e; ++i) {
> -        if (PartialGCD == GCD) {
> -          Operands.push_back(Expr->getOperand(i));
> -          continue;
> -        }
> +#ifndef NDEBUG
> +  DEBUG(dbgs() << "Terms after sorting:\n");
> +  for (unsigned I = 0; I < Terms.size(); ++I)
> +    DEBUG(dbgs() << *Terms[I] << "\n");
> +#endif

C++11 range loop? DEBUG({})?

This also removes the need for the NDEBUG check.

>
> -        const SCEV *Rem = Zero;
> -        const SCEV *Res = SCEVGCD::findGCD(SE, Expr->getOperand(i), GCD, &Rem);
> -        if (Rem == Zero) {
> -          PartialGCD = SE.getMulExpr(PartialGCD, Res);
> -          Operands.push_back(divide(SE, Expr->getOperand(i), Res));
> -        } else {
> -          Operands.push_back(Expr->getOperand(i));
> -        }
> -      }
> -    }
> +  const SCEV *Zero = SE.getConstant(this->getType(), 0);
> +  const SCEV *One = SE.getConstant(this->getType(), 1);
> +  findArrayDimensionsRec(SE, Terms, Sizes, Zero, One);
>
> -    if (Operands.size() == 1)
> -      return Operands[0];
> -    return SE.getMulExpr(Operands);
> -  }
> +#ifndef NDEBUG
> +  DEBUG(dbgs() << "Sizes:\n");
> +  for (unsigned int I = 0; I < Sizes.size(); ++I)
> +    DEBUG(dbgs() << *Sizes[I] << "\n");
> +#endif

C++11 range loop? DEBUG({})?

This also removes the need for the NDEBUG check.

> +}
>
> -  const SCEV *visitUDivExpr(const SCEVUDivExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> -  }
> +/// Third step of delinearization: compute the access functions for the
> +/// Subscripts based on the dimensions in Sizes.
> +const SCEV *SCEVAddRecExpr::computeAccessFunctions(
> +    ScalarEvolution &SE, SmallVectorImpl<const SCEV *> &Subscripts,
> +    SmallVectorImpl<const SCEV *> &Sizes) const {
> +  // Early exit in case this SCEV is not an affine multivariate function.
> +  const SCEV *Zero = SE.getConstant(this->getType(), 0);
> +  if (!this->isAffine())
> +    return Zero;
>
> -  const SCEV *visitAddRecExpr(const SCEVAddRecExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> +  DEBUG(dbgs() << "(delinearize: " << *this << "\n");
>
> -    assert(Expr->isAffine() && "Expr should be affine");
> +  const SCEV *Res = this, *Remainder = Zero;
> +  int Last = Sizes.size() - 1;
> +  for (int i = Last; i >= 0; i--) {
> +    const SCEV *Q, *R;
> +    SCEVDivision::divide(SE, Res, Sizes[i], &Q, &R);
>
> -    const SCEV *Start = divide(SE, Expr->getStart(), GCD);
> -    const SCEV *Step = divide(SE, Expr->getStepRecurrence(SE), GCD);
> +#ifndef NDEBUG
> +    DEBUG(dbgs() << "Res: " << *Res << "\n");
> +    DEBUG(dbgs() << "Sizes[i]: " << *Sizes[i] << "\n");
> +    DEBUG(dbgs() << "Res divided by Sizes[i]:\n");
> +    DEBUG(dbgs() << "Quotient: " << *Q << "\n");
> +    DEBUG(dbgs() << "Remainder: " << *R << "\n");
> +#endif

DEBUG({})?

This is no need to check NDEBUG. DEBUG is defined to an empty statement 
as soon as NDEBUG is set.

> -    return SE.getAddRecExpr(Start, Step, Expr->getLoop(),
> -                            Expr->getNoWrapFlags());
> -  }
> +    Res = Q;
>
> -  const SCEV *visitSMaxExpr(const SCEVSMaxExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> -  }
> +    if (i == Last) {
> +      // Do not record the last subscript corresponding to the size of elements
> +      // in the array.
> +      Remainder = R;
> +      continue;
> +    }
>
> -  const SCEV *visitUMaxExpr(const SCEVUMaxExpr *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> +    // Record the access function for the current subscript.
> +    Subscripts.push_back(R);
>     }
>
> -  const SCEV *visitUnknown(const SCEVUnknown *Expr) {
> -    if (GCD == Expr)
> -      return One;
> -    return Expr;
> -  }
> +  // Also push in last position the remainder of the last division: it will be
> +  // the access function of the innermost dimension.
> +  Subscripts.push_back(Res);
>
> -  const SCEV *visitCouldNotCompute(const SCEVCouldNotCompute *Expr) {
> -    return Expr;
> -  }
> +  std::reverse(Subscripts.begin(), Subscripts.end());
>
> -private:
> -  ScalarEvolution &SE;
> -  const SCEV *GCD, *Zero, *One;
> -};
> +#ifndef NDEBUG
> +  DEBUG(dbgs() << "Subscripts:\n");
> +  for (unsigned int I = 0; I < Subscripts.size(); ++I)
> +    DEBUG(dbgs() << *Subscripts[I] << "\n");
> +#endif

C++11 range loop? DEBUG({})?

No need for NDEBUG check.

> +  return Remainder;
>   }
>
>   /// Splits the SCEV into two vectors of SCEVs representing the subscripts and
> @@ -7253,75 +7439,24 @@ const SCEV *
>   SCEVAddRecExpr::delinearize(ScalarEvolution &SE,
>                               SmallVectorImpl<const SCEV *> &Subscripts,
>                               SmallVectorImpl<const SCEV *> &Sizes) const {
> -  // Early exit in case this SCEV is not an affine multivariate function.
> -  if (!this->isAffine())
> -    return this;
> +  // First step: collect parametric terms.
> +  SmallVector<const SCEV *, 4> Terms;
> +  collectParametricTerms(SE, Terms);
>
> -  const SCEV *Start = this->getStart();
> -  const SCEV *Step = this->getStepRecurrence(SE);
> +  // Second step: find subscript sizes.
> +  findArrayDimensions(SE, Terms, Sizes);
>
> -  // Build the SCEV representation of the canonical induction variable in the
> -  // loop of this SCEV.
> -  const SCEV *Zero = SE.getConstant(this->getType(), 0);
> -  const SCEV *One = SE.getConstant(this->getType(), 1);
> -  const SCEV *IV =
> -      SE.getAddRecExpr(Zero, One, this->getLoop(), this->getNoWrapFlags());
> -
> -  DEBUG(dbgs() << "(delinearize: " << *this << "\n");
> -
> -  // When the stride of this SCEV is 1, do not compute the GCD: the size of this
> -  // subscript is 1, and this same SCEV for the access function.
> -  const SCEV *Remainder = Zero;
> -  const SCEV *GCD = One;
> -
> -  // Find the GCD and Remainder of the Start and Step coefficients of this SCEV.
> -  if (Step != One && !Step->isAllOnesValue())
> -    GCD = SCEVGCD::findGCD(SE, Start, Step, &Remainder);
> -
> -  DEBUG(dbgs() << "GCD: " << *GCD << "\n");
> -  DEBUG(dbgs() << "Remainder: " << *Remainder << "\n");
> -
> -  const SCEV *Quotient = Start;
> -  if (GCD != One && !GCD->isAllOnesValue())
> -    // As findGCD computed Remainder, GCD divides "Start - Remainder." The
> -    // Quotient is then this SCEV without Remainder, scaled down by the GCD.  The
> -    // Quotient is what will be used in the next subscript delinearization.
> -    Quotient = SCEVDivision::divide(SE, SE.getMinusSCEV(Start, Remainder), GCD);
> -
> -  DEBUG(dbgs() << "Quotient: " << *Quotient << "\n");
> -
> -  const SCEV *Rem = Quotient;
> -  if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Quotient))
> -    // Recursively call delinearize on the Quotient until there are no more
> -    // multiples that can be recognized.
> -    Rem = AR->delinearize(SE, Subscripts, Sizes);
> -
> -  // Scale up the canonical induction variable IV by whatever remains from the
> -  // Step after division by the GCD: the GCD is the size of all the sub-array.
> -  if (Step != One && !Step->isAllOnesValue() && GCD != One &&
> -      !GCD->isAllOnesValue() && Step != GCD) {
> -    Step = SCEVDivision::divide(SE, Step, GCD);
> -    IV = SE.getMulExpr(IV, Step);
> -  }
> -  // The access function in the current subscript is computed as the canonical
> -  // induction variable IV (potentially scaled up by the step) and offset by
> -  // Rem, the offset of delinearization in the sub-array.
> -  const SCEV *Index = SE.getAddExpr(IV, Rem);
> -
> -  // Record the access function and the size of the current subscript.
> -  Subscripts.push_back(Index);
> -  Sizes.push_back(GCD);
> +  // Third step: compute the access functions for each subscript.
> +  const SCEV *Remainder = computeAccessFunctions(SE, Subscripts, Sizes);

Very nice. I like small functions!

>     DEBUG(dbgs() << "ArrayRef");
> -  for (int i = 0; i < Size; i++)
> +  for (unsigned int i = 0; i < Sizes.size(); i++)
>       DEBUG(dbgs() << "[" << *Subscripts[i] << "]");
>     DEBUG(dbgs() << "\n)\n");

The large amount of DEBUGs looks confusing. You could write it as follows:

     DEBUG({
	dbgs() << "ArrayRef";
         for (unsigned int i = 0; i < Sizes.size(); i++)
           dbgs() << "[" << *Subscripts[i] << "]";
         dbgs() << "\n)\n";
     });

That's for now. I need to go again through the algorithmic details, but 
for now I need some sleep.

Cheers,
Tobias