[llvm-dev] Determination of statements that contain only matrix multiplication
4lbert C0hen via llvm-dev
llvm-dev at lists.llvm.org
Sat May 28 07:48:47 PDT 2016
Sorry for not responding earlier.
On 05/20/2016 03:05 PM, Roman Gareev wrote:
> Thank you very much for the advices! I could probably try to avoid
> using of nonhardware prefetching in the project, if Tobias doesn’t
> disagree with it. My understanding is that prefetching isn’t used
> explicitly in [1] and, according to [2], in some cases 90% of the
> turbo boost peak of the processor can be attained without it.
Too many negations :-) I'm not sure I followed exactly what you wanted
to say, but I understand that this is not the priority since you can get
90% of the performance without worrying about prefetching.
> I started to consider prefetching, because it’s used in
> implementations of gemm micro-kernels of BLIS framework [3]. If I’m
> not mistaken, it’s applied to try to make sure that micro-panel Br is
> loaded after micro-panel Ar (as required in [1] p. 11). For example,
> its using helps to reduce the execution time of the attached
> implementation.
Interesting. The BLIS implementation prefetches only the first cache
line, before traversing a given interval of memory. This clearly
confirms the implementation relies on hardware preteching to prefetch
the subsequent lines. This makes a lot of sense. Yet surprisingly, the
BLIS implementation does not attempt at anticipating the fetch. It
schedules the prefetch instruction right before the first load of a
given interval.
> Refs:
>
> [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
> [2] - http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm
> [3] - https://github.com/flame/blis/blob/master/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c
>
More information about the llvm-dev
mailing list