[cfe-dev] [OT?] real-world interest of the polly optimiser

Thu Jun 1 14:37:15 PDT 2017

2017-06-01 21:52 GMT+02:00 Krzysztof Parzyszek via cfe-dev
<cfe-dev at lists.llvm.org>:
> In applications like linear algebra a lot of performance comes from
> optimizing loop nests for cache locality. Doing things like loop
> interchange, loop nest distribution, unroll and jam, etc. helps a lot with
> it, and to the best of my knowledge LLVM does none of that. There is some
> basic support for loop fusion and distribution, but I don't think it works
> on the nest level. Given how important that is in high-performance
> computing, the 20x difference sounds believable.

We implemented it recently, but only for gemm-like kernels, basically
the techniques from
http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

We sent a paper for review to ACM TACO. As it is under review, and I
am not the main author, I think cannot just share it publicly (yet).

Michael