[llvm-dev] Determination of statements that contain only matrix multiplication

Thu May 19 07:58:48 PDT 2016

Thank you for the elaborate explanation, although I don't have time to
go through all of them.

2016-05-19 16:09 GMT+02:00 Roman Gareev <gareevroman at gmail.com>:
> To get closer to an implementation of the algorithm from [1] for
> matrices stored in row-major order, we can unroll loop 7 and loop 8
> and perform vectorization with llvm (a corresponding code can be found
> attached). According to the attached IR, llvm sinks and hoists stores
> and loads related to matrix C.
>
> Nevertheless, llvm can’t do it, if, for example, we call prefetch
> within loop 6 or apply packing transformations in a way that is
> similar to the one mentioned in [2] (corresponding implementations are
> attached to the email). I haven’t found the reason yet.
>
> Refs:
>
> [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
> [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm

It would be great if you find a general way to handle such concerns,
maybe even modifying LLVM's passes. Since Tobias is your advisor, you
might discuss with him the details for which it is handling cases in
more general ways vs. only in detected gemm kernels.

AFAIK prefetch instructions are not automatically inserted anywhere in
LLVM. It would be nice to insert them X instructions/accesses in
advance, where X is determined somehow. For gemm only, one could do
some measurement for different architectures, but we don't get around
to have have it derived from some command-line argument in either
case.

Michael

-- 
Tardyzentrismus verboten!