[llvm-dev] Determination of statements that contain only matrix multiplication

Roman Gareev via llvm-dev llvm-dev at lists.llvm.org
Sun May 29 04:34:51 PDT 2016

2016-05-28 19:48 GMT+05:00 4lbert C0hen <4lbert.h.c0hen at gmail.com>:
> Sorry for not responding earlier.
> On 05/20/2016 03:05 PM, Roman Gareev wrote:
>> Thank you very much for the advices! I could probably try to avoid
>> using of nonhardware prefetching in the project, if Tobias doesn’t
>> disagree with it. My understanding is that prefetching isn’t used
>> explicitly in [1] and, according to [2], in some cases 90% of the
>> turbo boost peak of the processor can be attained without it.
> Too many negations :-) I'm not sure I followed exactly what you wanted to
> say, but I understand that this is not the priority since you can get 90% of
> the performance without worrying about prefetching.

Sorry for the misunderstanding. Yes, I think that if nobody minds,
prefetching couldn’t be the priority of this project, because for some
platforms we can get 90% of the performance without worrying about it.
Furthermore, as you mentioned before, hardware prefetchers can be good
at strided accesses in single-threaded code.

>> I started to consider prefetching, because it’s used in
>> implementations of gemm micro-kernels of BLIS framework [3]. If I’m
>> not mistaken, it’s applied to try to make sure that micro-panel Br is
>> loaded after micro-panel Ar (as required in [1] p. 11). For example,
>> its using helps to reduce the execution time of the attached
>> implementation.
> Interesting. The BLIS implementation prefetches only the first cache line,
> before traversing a given interval of memory. This clearly confirms the
> implementation relies on hardware preteching to prefetch the subsequent
> lines. This makes a lot of sense.

Thank you for the explanation!

> Yet surprisingly, the BLIS implementation
> does not attempt at anticipating the fetch. It schedules the prefetch
> instruction right before the first load of a given interval.

Yes, I think that it’s interesting.

>> Refs:
>> [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
>> [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm
>> [3] -
>> https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c

                                    Cheers, Roman Gareev.

More information about the llvm-dev mailing list