[llvm-dev] Determination of statements that contain only matrix multiplication

Thu May 19 08:50:35 PDT 2016

2016-05-19 19:58 GMT+05:00 Michael Kruse <meinersbur at googlemail.com>:
> Thank you for the elaborate explanation, although I don't have time to
> go through all of them.
>
> 2016-05-19 16:09 GMT+02:00 Roman Gareev <gareevroman at gmail.com>:
>> To get closer to an implementation of the algorithm from [1] for
>> matrices stored in row-major order, we can unroll loop 7 and loop 8
>> and perform vectorization with llvm (a corresponding code can be found
>> attached). According to the attached IR, llvm sinks and hoists stores
>> and loads related to matrix C.
>>
>> Nevertheless, llvm can’t do it, if, for example, we call prefetch
>> within loop 6 or apply packing transformations in a way that is
>> similar to the one mentioned in [2] (corresponding implementations are
>> attached to the email). I haven’t found the reason yet.
>>
>> Refs:
>>
>> [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
>> [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm
>
> It would be great if you find a general way to handle such concerns,
> maybe even modifying LLVM's passes. Since Tobias is your advisor, you
> might discuss with him the details for which it is handling cases in
> more general ways vs. only in detected gemm kernels.
>
> AFAIK prefetch instructions are not automatically inserted anywhere in
> LLVM. It would be nice to insert them X instructions/accesses in
> advance, where X is determined somehow. For gemm only, one could do
> some measurement for different architectures, but we don't get around
> to have have it derived from some command-line argument in either
> case.

Thank you for the ideas! I’ll try to take them into account.

-- 
                                    Cheers, Roman Gareev.