[PATCH] D114336: [Polly] Generalize the pattern matching to the case of tensor contractions.

Michael Kruse via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 2 11:55:45 PDT 2022


Meinersbur accepted this revision.
Meinersbur added a comment.
This revision is now accepted and ready to land.

Thank you Gareev. I think the description can still be improved, I but we should also move forward and can improve iteratively.

Looking forward for the actual TC optimization.



================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:199
+/// Tensor contraction (TC) of tensors A, B into tensor C can be represented as
+/// C(shuffle(I,J))=∑α·A(shuffle(I,P))·B(shuffle(P,J))+β·C(shuffle(I,J)),
+/// where ∑ is a summation over all contracted indices of P,
----------------
AFAIU multiplication by β is not part of this detection, but required to be loop-distributed by the isl scheduler.


================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1176
+/// Obtained indexes i1, …, in, their sizes and their permutation are stored
+/// into @p  IndexSet, @p DimensionSizes, and @p Dimensions, respectively.
+///
----------------



================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1649
+/// 3. SCoP contains an arbitrary number of reads from constants and only three
+///    access relations, MA2, MA3, and MA4 that epresent reading from memory
+///    and have the form
----------------
[typo]


================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1730
+/// If this is the case, we could logically represent tensors as matrices and
+/// apply Goto's algorithm, which is used to get close-to-peak performance of
+/// matrix multiplications in manually tuned BLAS libraries (e.g., BLIS).
----------------
What is Goto here? GotoBLAS?


================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1261
+  TCI.ReadFromC = nullptr;
+  SmallVector<MemoryAccess *, 32> Accesses = getAccessesInOrder(*Stmt);
+  for (auto *MemA = Accesses.begin(); *MemA != TCI.WriteToC; MemA++) {
----------------
gareevroman wrote:
> Meinersbur wrote:
> > gareevroman wrote:
> > > Meinersbur wrote:
> > > > `getAccessesInOrder` requires `Stmt` to not be a RegionStmt. Please add a test for it.
> > > I’ve added a check to containsOnlyTCAcc. Could you clarify how the test case should look like? Should it be a region statement that contains a matrix multiplication with right order of memory accesses?
> > Test in `containsOnlyTCAcc` is exactly what I was looking for. A region statement could look like this:
> > 
> > ```
> > c = C[i][j];
> > if (/*non-affine condition*/) {
> >   (void)A[i][k] + B[k][j];
> > } else {
> >   C[i][j] = c;
> > }
> > ```
> > which has the correct order of accesses but is obviously not what we are looking for.
> > 
> Thanks for the example! I have added a corresponding test case. If I am not mistaken, it requires DeLICM.
It does not require DeLICM, but `-polly-allow-nonaffine-branches` (which is enabled by default)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114336/new/

https://reviews.llvm.org/D114336



More information about the llvm-commits mailing list