[PATCH] D125202: [Polly] Disable matmul pattern-match + -polly-parallel

Mon May 16 14:51:18 PDT 2022

Meinersbur added a comment.

In D125202#3514940 <https://reviews.llvm.org/D125202#3514940>, @gareevroman wrote:

> I would suggest to parallelize the second loop around the micro-kernel by default. It would not violate the dependencies. In general, it can provide a good opportunity for parallelization (please, see [1] and [2]). In particular, the reduction of time spent in this loop may cancel out the cost of packing the elements of the created array Packed_A into the L2 cache.

I fear that $loop_4$ does not have enough work to justify the parallelization overhead. Also, there will be false sharing between cache lines. It could be reduced by having the `#pragma omp parallel` outside the matrix multiplication, and only `#pragma omp for` on $loop_4$. However, Polly does not support that yet.

The usual candidate for coarse-grain parallelization is always the outermost one, unless we want to exploit a shared cache but that would be optional. We'd divide the packed array size equally between threads.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125202/new/

https://reviews.llvm.org/D125202