[PATCH] D108885: [Delinerization] Keep array element byte offset.

Wed Sep 8 11:20:44 PDT 2021

bmahjour added a comment.

In D108885#2978890 <https://reviews.llvm.org/D108885#2978890>, @Meinersbur wrote:

> In D108885#2977230 <https://reviews.llvm.org/D108885#2977230>, @bmahjour wrote:
>
>> If it needs to be isolated to be handled, then I don't see how returning it as a separate parameter makes it more complicated.
>
> It only is handled differently because the legacy code does not expect an additional parameter. However, fixing the bug a higher priority and the individual passes could be improved making proper uses of the returned subscript at a later point.
>
>> The original goal of the delinearization algorithm is to recover source level subscripts.
>
> [citation needed]
>
> If this was true than any delinearization that does not correspond to the original source subscript needed to be considered wrong. However, I am pretty sure we also want to delinearize `A[i +j*n]` even though the source code subscript has been linearized.
>
>> I tried a simple example and it seems that the "element offset" is zero regardless of which member of a structure is being accessed, so not sure if it has any bearing on the ability to delinearize arrays of structs:
>
> It is delinearized as array element `sizeof(float)`, not `sizeof(MyS)` with one dimension twice as large and SCEVAddRecExpr starting with 1 instead of 0. This has to do with how ScalarEvolution tries to remove the struct index from the GEP, modeling it as a simple addition.
>
> Use different elements of different sizes in the struct:
>
>   struct __attribute__((packed)) Pair { int x; long y; };
>   void foo(unsigned long N, struct Pair A[][N]) {
>     for (unsigned long i = 0; i < N; i+=1)
>         A[i][i].y = 0;
>   }
>
>
>
> name=opt -delinearize
>   AccessFunction: {4,+,(12 + (12 * %N))}<%for.cond>
>   Base offset: %A
>   ArrayDecl[UnknownSize][%N][8]
>   ArrayRef[0][{0,+,1}<%for.cond>][{4,+,(4 + (12 * %N))}<%for.cond>]
>
> This is obviously more complicated than it needs to be. Specifically, it still considers element size to be `sizeof(long)`, because that's the access it sees. If accessing `x` instead of `y` the element size would be 4. Polly tries to combine the shapes from multiple delinearization results into a common one such that subscripts are comparable.
>
> Yes, with the current implementation of delinearization, I don't think there are a lot of cases where the byte offset subscript would improve the dependence analysis. Still, I think the right API that allows improving the delinerization includes the byte offset.
>
> Having derived this example, it is much better understandable than the regression test I got from llvm-reduce, I could replace it.

Thanks for the example. It does make it much easier to understand the problem. I also have concerns about the current implementation not being able to handle arrays of structures. To understand this better, I derived an example based on your example above:

  struct __attribute__((packed)) MyS {
    float a;
    double b;
  };

  void foo(long long n, long long m, struct MyS f1[][n][m]) {
    for (int i = 0; i < 1024; i++)
      for (int j = 0; j < n; j++)
        for (int k = 0; k < m; k++)
          f1[i][j][k].b = f1[i-1][j][k].b;
  }

Ideally we would want delinearization to recover the subscripts for the load to be `[i-1][j][k]` and the subscripts for the store to be `[i][j][k]`, so that the dependence analysis can produce `[-1, 0, 0]`. With the changes in this patch we get the following delinearization for the load and stores respectively:

  Inst:  %4 = load double, double* %b, align 1, !tbaa !3
  In Loop with Header: for.body11
  AccessFunction: {{{(4 + (-12 * %n * %m)),+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,12}<%for.body11>
  Base offset: %f1
  ArrayDecl[UnknownSize][%n][%m][8]
  ArrayRef[0][0][{0,+,1}<nuw><nsw><%for.body11>][{{{(4 + (-12 * %n * %m)),+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,4}<%for.body11>]

  Inst:  store double %4, double* %b22, align 1, !tbaa !3
  In Loop with Header: for.body11
  AccessFunction: {{{4,+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,12}<%for.body11>
  Base offset: %f1
  ArrayDecl[UnknownSize][%n][%m][8]
  ArrayRef[0][0][{0,+,1}<nuw><nsw><%for.body11>][{{{4,+,(12 * %n * %m)}<%for.body>,+,(12 * %m)}<%for.body5>,+,4}<%for.body11>]

Even though delinearization claims to be successful most of the complexity of the original access function has now moved to the inner most dimension, leaving the outer subscripts as zeros. This won't help dependence analysis in any way, as it now has to further analyze the complicated subscript for the synthesized inner most dimension. It's not clear to me how that can be done.

Note that if the element size passed to the delinearize function was chosen to be 12 (the true element size of the array), then delinearization would have been able to recover more meaningful subscripts for the outer dimensions without requiring a "byte offset". I wonder if we can improve the results for structure of arrays by choosing a better element size.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108885/new/

https://reviews.llvm.org/D108885