delinearization

Wed Apr 9 23:46:44 PDT 2014

On 04/09/2014 09:33 PM, Sebastian Pop wrote:
> Tobias Grosser wrote:
>> >On 04/08/2014 11:38 PM, Sebastian Pop wrote:
>>> > >Tobias Grosser wrote:
>>>> > >>On 04/04/2014 11:06 PM, Sebastian Pop wrote:
>>>>> > >>>Hi,
>>>>> > >>>
>>>>> > >>>here is the updated version of the patch that passes all the make check-polly
>>>>> > >>>tests: yay! (I don't have cloog around, so I haven't checked those tests.)
>>>> > >>
>>>> > >>The cloog tests need to be updated. There are three solutions
>>>> > >>
>>>> > >>1) We write a clang-cmp tool that does this semantically
>>>> > >>2) You install it briefly and update the tests
>>> > >
>>> > >I installed cloog and checked with it before I committed the patch.
>>> > >All tests are passing.
>>> > >
>>> > >However, when I enable delinearization by default, 2 cloog tests are failing:
>>> > >Tobi, could you please have a look at these two?
>>> > >
>>> > >test/Cloog/CodeGen/matmul_vec.ll
>>> > >test/Cloog/CodeGen/MemAccess/codegen_simple_md.ll
>>> > >
>>> > >
>>> > >These are failing with a JScop error complaining about inconsistent array access
>>> > >functions (or something else...)
>> >
>> >Yes, I looked into them. We can fix them by updating the .jscop files.
>> >
>> >When looking into these test cases, I have seen that you also
>> >delinearize access functions to fixed size, non linear arrays. This
>> >is very tempting, but for test/Cloog/CodeGen/matmul_vec.ll this
>> >currently does not work correctly.
>> >
>> >MemRef_A[i0 + 1024i2]
>> >
>> >is translated to
>> >
>> >MemRef_A[i0, 1024i2]
>> >
>> >What is the reason to even try to delinearize non-parametric
>> >accesses? Doing so is not necessary to increase the precision of our
>> >analysis and if we get it wrong it has actually negative effects.
>> >Also, I believe delinearizing such accesses is inherently hard. I
>> >personally have the feeling limiting ourselves to delinearize arrays
>> >with symbolic sizes
>> >is a safer bet.
>> >
>>>> > >>3) You commit and I fix later.
>>>> > >>
>>>> > >>I would prefer 1), understand that 2) is easier and if necessary, we
>>>> > >>can also go for 3).
>>>> > >>
>>>>> > >>>The only changes wrt. previous patch are:
>>>>> > >>>- add a flag -polly-delinearize (similar to -da-delinearize)
>>>>> > >>>- normalize the last subscript with respect to the size of elements in the array.
>>>> > >>
>>>> > >>Nice, that this fixed the remaining issues.
>>>> > >>
>>>>> > >>>Ok to commit?
>>>> > >>
>>>> > >>Yes, despite the CLooG test cases this looks good.
>>> > >
>>> > >No cloog tests were failing on my side, so our build bots should be happy with
>>> > >my last commit.
>>> > >
>>> > >I have started triaging the bugs in the nightly test-suite when enabling
>>> > >-polly-delinearization.  I have fixed a couple of bugs already: that just shows
>>> > >that all this code was never really tested...  I will continue a bit on fixing
>>> > >everything that fails in SCEV->delinearize and on the -polly-delinearize side.
>> >
>> >The delinearization bug I opened is still open.
>> >
>> >I looked at the way we do delinearization and especially at examples
>> >of non-static size arrays. I attached the following example:
>> >
>> >; void foo(long n, long m, double A[n][32]) {
>> >;   for (long i = 0; i < n; i++)
>> >;     for (long j = 0; j < m; j++)
>> >;       A[i][j] = A[i][j+i];
>> >; }
>> >
>> >In the inner loop, we get the following delinearization:
>> >
>> >Inst:  %val = load double* %arrayidx.load
>> >In Loop with Header: for.j
>> >AddRec: {{%A,+,(33 * sizeof(double))}<%for.i>,+,sizeof(double)}<%for.j>
>> >Base offset: %A
>> >ArrayDecl[UnknownSize][33] with elements of sizeof(double) bytes.
>> >ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>]
>> >
>> >Inst:  store double %val, double* %arrayidx.store
>> >In Loop with Header: for.j
>> >AddRec: {{%A,+,(32 * sizeof(double))}<%for.i>,+,sizeof(double)}<%for.j>
>> >Base offset: %A
>> >ArrayDecl[UnknownSize][32] with elements of sizeof(double) bytes.
>> >ArrayRef[{0,+,1}<nuw><nsw><%for.i>][{0,+,1}<nuw><nsw><%for.j>]
>> >
>> >For the load we derive a size of 33 elements in the inner dimension,
>> >for the store a size of 32 elements. This is inconsistent and to me
>> >it is unclear how to derive a common delinearization. For symbolic
>> >sizes the problem should be a lot simpler.
>> >
> Like this?

Yes, this makes sense.

However, I think we may possibly also need to adapt the delinearization 
itself. Though this should not hold this patch to go in.

Tobias

>
> Thanks,
> Sebastian
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
>
>
> 0001-only-delinearize-when-the-access-function-is-not-aff.patch
>
>
>  From 1928bf48ec4649ff73e6510f0aebd8c92b10b12a Mon Sep 17 00:00:00 2001
> From: Sebastian Pop<spop at codeaurora.org>
> Date: Wed, 9 Apr 2014 14:21:18 -0500
> Subject: [PATCH] only delinearize when the access function is not affine
>
> ---
>   lib/Analysis/ScopDetection.cpp                     |   17 ++++++----
>   lib/Analysis/TempScopInfo.cpp                      |   30 ++++++++++---------
>   .../OpenMP/nested_loop_both_parallel_parametric.ll |   21 ++++++++-----
>   test/ScopInfo/constant_start_integer.ll            |    4 +-
>   4 files changed, 41 insertions(+), 31 deletions(-)
>
> diff --git a/lib/Analysis/ScopDetection.cpp b/lib/Analysis/ScopDetection.cpp
> index 5f03e7a..9ce0fc7 100644
> --- a/lib/Analysis/ScopDetection.cpp
> +++ b/lib/Analysis/ScopDetection.cpp
> @@ -364,12 +364,19 @@ bool ScopDetection::isValidMemoryAccess(Instruction &Inst,
>       return invalid<ReportVariantBasePtr>(Context, /*Assert=*/false, BaseValue);
>
>     AccessFunction = SE->getMinusSCEV(AccessFunction, BasePointer);
> -  const SCEVAddRecExpr *AF = dyn_cast<SCEVAddRecExpr>(AccessFunction);
>
>     if (AllowNonAffine) {
>       // Do not check whether AccessFunction is affine.
> -  } else if (PollyDelinearize && AF) {
> -    // Try to delinearize AccessFunction.
> +  } else if (!isAffineExpr(&Context.CurRegion, AccessFunction, *SE,
> +                           BaseValue)) {
> +    const SCEVAddRecExpr *AF = dyn_cast<SCEVAddRecExpr>(AccessFunction);
> +    if (!PollyDelinearize || !AF)
> +      return invalid<ReportNonAffineAccess>(Context, /*Assert=*/true,
> +                                            AccessFunction);
> +
> +    // Try to delinearize AccessFunction only when the expression is known to
> +    // not be affine: as all affine functions can be represented without
> +    // problems in Polly, we do not have to delinearize them.
>       SmallVector<const SCEV *, 4> Subscripts, Sizes;
>       AF->delinearize(*SE, Subscripts, Sizes);
>       int size = Subscripts.size();
> @@ -378,10 +385,6 @@ bool ScopDetection::isValidMemoryAccess(Instruction &Inst,
>         if (!isAffineExpr(&Context.CurRegion, Subscripts[i], *SE, BaseValue))
>           return invalid<ReportNonAffineAccess>(Context, /*Assert=*/true,
>                                                 AccessFunction);
> -  } else if (!isAffineExpr(&Context.CurRegion, AccessFunction, *SE,
> -                           BaseValue)) {
> -    return invalid<ReportNonAffineAccess>(Context, /*Assert=*/true,
> -                                          AccessFunction);
>     }
>
>     // FIXME: Alias Analysis thinks IntToPtrInst aliases with alloca instructions
> diff --git a/lib/Analysis/TempScopInfo.cpp b/lib/Analysis/TempScopInfo.cpp
> index 3b2a451..5579626 100644
> --- a/lib/Analysis/TempScopInfo.cpp
> +++ b/lib/Analysis/TempScopInfo.cpp
> @@ -169,30 +169,32 @@ IRAccess TempScopInfo::buildIRAccess(Instruction *Inst, Loop *L, Region *R) {
>     AccessFunction = SE->getMinusSCEV(AccessFunction, BasePointer);
>     SmallVector<const SCEV *, 4> Subscripts, Sizes;
>
> -  bool IsAffine = true;
> +  bool IsAffine = isAffineExpr(R, AccessFunction, *SE, BasePointer->getValue());
>     const SCEVAddRecExpr *AF = dyn_cast<SCEVAddRecExpr>(AccessFunction);
>
> -  if (PollyDelinearize && AF) {
> +  if (!IsAffine && PollyDelinearize && AF) {
>       const SCEV *Remainder = AF->delinearize(*SE, Subscripts, Sizes);
>       int NSubs = Subscripts.size();
>
> -    // Normalize the last dimension: integrate the size of the "scalar
> -    // dimension" and the remainder of the delinearization.
> -    Subscripts[NSubs - 1] =
> -        SE->getMulExpr(Subscripts[NSubs - 1], Sizes[NSubs - 1]);
> -    Subscripts[NSubs - 1] = SE->getAddExpr(Subscripts[NSubs - 1], Remainder);
> -
> -    for (int i = 0; i < NSubs; ++i)
> -      if (!isAffineExpr(R, Subscripts[i], *SE, BasePointer->getValue())) {
> -        IsAffine = false;
> -        break;
> -      }
> +    if (NSubs > 0) {
> +      // Normalize the last dimension: integrate the size of the "scalar
> +      // dimension" and the remainder of the delinearization.
> +      Subscripts[NSubs - 1] =
> +          SE->getMulExpr(Subscripts[NSubs - 1], Sizes[NSubs - 1]);
> +      Subscripts[NSubs - 1] = SE->getAddExpr(Subscripts[NSubs - 1], Remainder);
> +
> +      IsAffine = true;
> +      for (int i = 0; i < NSubs; ++i)
> +        if (!isAffineExpr(R, Subscripts[i], *SE, BasePointer->getValue())) {
> +          IsAffine = false;
> +          break;
> +        }
> +    }
>     }
>
>     if (Subscripts.size() == 0) {
>       Subscripts.push_back(AccessFunction);
>       Sizes.push_back(SE->getConstant(ZeroOffset->getType(), Size));
> -    IsAffine = isAffineExpr(R, AccessFunction, *SE, BasePointer->getValue());
>     }
>
>     return IRAccess(Type, BasePointer->getValue(), AccessFunction, Size, IsAffine,
> diff --git a/test/Isl/Ast/OpenMP/nested_loop_both_parallel_parametric.ll b/test/Isl/Ast/OpenMP/nested_loop_both_parallel_parametric.ll
> index 2c11fc5..39b9108 100644