[PATCH] D125202: [Polly] Disable matmul pattern-match + -polly-parallel
Michael Kruse via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 16 14:51:18 PDT 2022
Meinersbur added a comment.
In D125202#3514940 <https://reviews.llvm.org/D125202#3514940>, @gareevroman wrote:
> I would suggest to parallelize the second loop around the micro-kernel by default. It would not violate the dependencies. In general, it can provide a good opportunity for parallelization (please, see [1] and [2]). In particular, the reduction of time spent in this loop may cancel out the cost of packing the elements of the created array Packed_A into the L2 cache.
I fear that $loop_4$ does not have enough work to justify the parallelization overhead. Also, there will be false sharing between cache lines. It could be reduced by having the `#pragma omp parallel` outside the matrix multiplication, and only `#pragma omp for` on $loop_4$. However, Polly does not support that yet.
The usual candidate for coarse-grain parallelization is always the outermost one, unless we want to exploit a shared cache but that would be optional. We'd divide the packed array size equally between threads.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D125202/new/
https://reviews.llvm.org/D125202
More information about the llvm-commits
mailing list