[PATCH] D114336: [Polly] Generalize the pattern matching to the case of tensor contractions.
Michael Kruse via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 2 11:55:45 PDT 2022
Meinersbur accepted this revision.
Meinersbur added a comment.
This revision is now accepted and ready to land.
Thank you Gareev. I think the description can still be improved, I but we should also move forward and can improve iteratively.
Looking forward for the actual TC optimization.
================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:199
+/// Tensor contraction (TC) of tensors A, B into tensor C can be represented as
+/// C(shuffle(I,J))=∑α·A(shuffle(I,P))·B(shuffle(P,J))+β·C(shuffle(I,J)),
+/// where ∑ is a summation over all contracted indices of P,
----------------
AFAIU multiplication by β is not part of this detection, but required to be loop-distributed by the isl scheduler.
================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1176
+/// Obtained indexes i1, …, in, their sizes and their permutation are stored
+/// into @p IndexSet, @p DimensionSizes, and @p Dimensions, respectively.
+///
----------------
================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1649
+/// 3. SCoP contains an arbitrary number of reads from constants and only three
+/// access relations, MA2, MA3, and MA4 that epresent reading from memory
+/// and have the form
----------------
[typo]
================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1730
+/// If this is the case, we could logically represent tensors as matrices and
+/// apply Goto's algorithm, which is used to get close-to-peak performance of
+/// matrix multiplications in manually tuned BLAS libraries (e.g., BLIS).
----------------
What is Goto here? GotoBLAS?
================
Comment at: polly/lib/Transform/MatmulOptimizer.cpp:1261
+ TCI.ReadFromC = nullptr;
+ SmallVector<MemoryAccess *, 32> Accesses = getAccessesInOrder(*Stmt);
+ for (auto *MemA = Accesses.begin(); *MemA != TCI.WriteToC; MemA++) {
----------------
gareevroman wrote:
> Meinersbur wrote:
> > gareevroman wrote:
> > > Meinersbur wrote:
> > > > `getAccessesInOrder` requires `Stmt` to not be a RegionStmt. Please add a test for it.
> > > I’ve added a check to containsOnlyTCAcc. Could you clarify how the test case should look like? Should it be a region statement that contains a matrix multiplication with right order of memory accesses?
> > Test in `containsOnlyTCAcc` is exactly what I was looking for. A region statement could look like this:
> >
> > ```
> > c = C[i][j];
> > if (/*non-affine condition*/) {
> > (void)A[i][k] + B[k][j];
> > } else {
> > C[i][j] = c;
> > }
> > ```
> > which has the correct order of accesses but is obviously not what we are looking for.
> >
> Thanks for the example! I have added a corresponding test case. If I am not mistaken, it requires DeLICM.
It does not require DeLICM, but `-polly-allow-nonaffine-branches` (which is enabled by default)
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D114336/new/
https://reviews.llvm.org/D114336
More information about the llvm-commits
mailing list