[llvm-dev] Determination of statements that contain only matrix multiplication

Michael Kruse via llvm-dev llvm-dev at lists.llvm.org
Tue May 17 04:47:29 PDT 2016


2016-05-16 19:52 GMT+02:00 Roman Gareev <gareevroman at gmail.com>:
> Hi Tobias,
>
> could we use information about memory accesses of a SCoP statement and
> def-use chains to determine statements, which don’t contain matrix
> multiplication of the following form?

Assuming s/don't/do you want to pattern-match gemm kernels inside larger scops.


> for (int i = 0; i < Upper Bound1; i++)
>   for (int j = 0; j < Upper Bound2; j++)
>     for (int k = 0; k < Upper Bound3; j++)
>       C[i][j] += A[i][k] * B[k][j]
>
> We could probably check that memory access relations have the following form:
>
> "accesses" : [
>   {
>     "kind" : "read",
>     "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_1[i0, i2] }"
>   },
>   {
>     "kind" : "read",
>     "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_2[i2, i1] }"
>   },
>   {
>     "kind" : "read",
>     "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_3[i0, i1] }"
>   },
>   {
>     "kind" : "write",
>     "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_4[i0, i1] }"
>   }
> ]
>
> and there are the following relations between access instructions of
> the memory accesses:
>
> AccessInst1 --- def-use --- \
> AccessInst2 --- def-use --- fmul
>                                             |
>                                        def-use
>                                             |
> AccessInst3 --- def-use --- fadd
>                                             |
>                                        def-use
>                                             |
>                                  store (AccessInst4)
>
> (fmul is a user of AccessInst1 and AccessInst2. fadd is a user of the
> fmul and AccessInst3. store (AccessInst4) is a user of fadd)

You'd also have to check which access are not there. E.g. the value of
C[i][j] might be used in the inner loop for some other purpose as
well, which would make some transformations illegal.

Also notice that LICM will hoist the C[i][j] reduction out of the
k-loop, i.e. the pattern is different. DeLICM (WIP) is intended to
undo this again.


> Maybe it could be a temporary solution. I think that if the checks are
> successfully passed and the basic block of the statement has exactly
> 14 instructions, the statement contains matrix multiplication and can
> be safely optimized with a generation of specific code, which takes
> into account information about usage of SIMD registers.

What kind of transformation do you have in mind that cannot be done in
a general way? If you want to take register pressure into account, why
not also for other scops?

Like the LoopIdiomPass, one could replace a detected gemm kernel my a
call to a hand-optimized/external [sdcz]gemm function. However, I
think the goal of Polly is to optimize such kernels by itself.


> I have one more question. Are memory accesses of MemAccs from the
> ScopStmt class ordered by their sequence order?

Only the MK_Array accesses should be in order relative to each other.
All scalar reads happen when entering a statement and all scalar
writes at the end of it.

Michael

-- 
Tardyzentrismus verboten!


More information about the llvm-dev mailing list