[llvm-dev] Determination of statements that contain only matrix multiplication
Michael Kruse via llvm-dev
llvm-dev at lists.llvm.org
Tue May 17 04:47:29 PDT 2016
2016-05-16 19:52 GMT+02:00 Roman Gareev <gareevroman at gmail.com>:
> Hi Tobias,
>
> could we use information about memory accesses of a SCoP statement and
> def-use chains to determine statements, which don’t contain matrix
> multiplication of the following form?
Assuming s/don't/do you want to pattern-match gemm kernels inside larger scops.
> for (int i = 0; i < Upper Bound1; i++)
> for (int j = 0; j < Upper Bound2; j++)
> for (int k = 0; k < Upper Bound3; j++)
> C[i][j] += A[i][k] * B[k][j]
>
> We could probably check that memory access relations have the following form:
>
> "accesses" : [
> {
> "kind" : "read",
> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_1[i0, i2] }"
> },
> {
> "kind" : "read",
> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_2[i2, i1] }"
> },
> {
> "kind" : "read",
> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_3[i0, i1] }"
> },
> {
> "kind" : "write",
> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_4[i0, i1] }"
> }
> ]
>
> and there are the following relations between access instructions of
> the memory accesses:
>
> AccessInst1 --- def-use --- \
> AccessInst2 --- def-use --- fmul
> |
> def-use
> |
> AccessInst3 --- def-use --- fadd
> |
> def-use
> |
> store (AccessInst4)
>
> (fmul is a user of AccessInst1 and AccessInst2. fadd is a user of the
> fmul and AccessInst3. store (AccessInst4) is a user of fadd)
You'd also have to check which access are not there. E.g. the value of
C[i][j] might be used in the inner loop for some other purpose as
well, which would make some transformations illegal.
Also notice that LICM will hoist the C[i][j] reduction out of the
k-loop, i.e. the pattern is different. DeLICM (WIP) is intended to
undo this again.
> Maybe it could be a temporary solution. I think that if the checks are
> successfully passed and the basic block of the statement has exactly
> 14 instructions, the statement contains matrix multiplication and can
> be safely optimized with a generation of specific code, which takes
> into account information about usage of SIMD registers.
What kind of transformation do you have in mind that cannot be done in
a general way? If you want to take register pressure into account, why
not also for other scops?
Like the LoopIdiomPass, one could replace a detected gemm kernel my a
call to a hand-optimized/external [sdcz]gemm function. However, I
think the goal of Polly is to optimize such kernels by itself.
> I have one more question. Are memory accesses of MemAccs from the
> ScopStmt class ordered by their sequence order?
Only the MK_Array accesses should be in order relative to each other.
All scalar reads happen when entering a statement and all scalar
writes at the end of it.
Michael
--
Tardyzentrismus verboten!
More information about the llvm-dev
mailing list