[llvm-dev] Determination of statements that contain only matrix multiplication
Tobias Grosser via llvm-dev
llvm-dev at lists.llvm.org
Tue May 17 07:37:24 PDT 2016
On 05/17/2016 01:47 PM, Michael Kruse wrote:
> 2016-05-16 19:52 GMT+02:00 Roman Gareev <gareevroman at gmail.com>:
>> Hi Tobias,
>>
>> could we use information about memory accesses of a SCoP statement and
>> def-use chains to determine statements, which don’t contain matrix
>> multiplication of the following form?
>
> Assuming s/don't/do you want to pattern-match gemm kernels inside larger scops.
>
>
>> for (int i = 0; i < Upper Bound1; i++)
>> for (int j = 0; j < Upper Bound2; j++)
>> for (int k = 0; k < Upper Bound3; j++)
>> C[i][j] += A[i][k] * B[k][j]
>>
>> We could probably check that memory access relations have the following form:
>>
>> "accesses" : [
>> {
>> "kind" : "read",
>> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_1[i0, i2] }"
>> },
>> {
>> "kind" : "read",
>> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_2[i2, i1] }"
>> },
>> {
>> "kind" : "read",
>> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_3[i0, i1] }"
>> },
>> {
>> "kind" : "write",
>> "relation" : "{ Stmt_14[i0, i1, i2] -> MemRef_4[i0, i1] }"
>> }
>> ]
>>
>> and there are the following relations between access instructions of
>> the memory accesses:
>>
>> AccessInst1 --- def-use --- \
>> AccessInst2 --- def-use --- fmul
>> |
>> def-use
>> |
>> AccessInst3 --- def-use --- fadd
>> |
>> def-use
>> |
>> store (AccessInst4)
>>
>> (fmul is a user of AccessInst1 and AccessInst2. fadd is a user of the
>> fmul and AccessInst3. store (AccessInst4) is a user of fadd)
>
> You'd also have to check which access are not there. E.g. the value of
> C[i][j] might be used in the inner loop for some other purpose as
> well, which would make some transformations illegal.
>
> Also notice that LICM will hoist the C[i][j] reduction out of the
> k-loop, i.e. the pattern is different. DeLICM (WIP) is intended to
> undo this again.
Very right.
>> Maybe it could be a temporary solution. I think that if the checks are
>> successfully passed and the basic block of the statement has exactly
>> 14 instructions, the statement contains matrix multiplication and can
>> be safely optimized with a generation of specific code, which takes
>> into account information about usage of SIMD registers.
>
> What kind of transformation do you have in mind that cannot be done in
> a general way? If you want to take register pressure into account, why
> not also for other scops?
Good questions.
> Like the LoopIdiomPass, one could replace a detected gemm kernel my a
> call to a hand-optimized/external [sdcz]gemm function. However, I
> think the goal of Polly is to optimize such kernels by itself.
Right. This does not seem to be too interesting.
I am personally mostly interested in getting the pieces in place that
allow us to get close-to-peak performance for gemm, but agree with
Michael that we clearly want to generalize this in the future.
However, going with a vertical implementation first might be a good idea
(assuming the pattern matching is simple). In this case we could detect
a gemm kernel and use this logic to get all important optimizations we
need in place.
Best,
Tobias
More information about the llvm-dev
mailing list