[all-commits] [llvm/llvm-project] 3bb969: [flang] Inline hlfir.matmul[_transpose]. (#122821)
Slava Zakharin via All-commits
all-commits at lists.llvm.org
Wed Jan 15 08:43:18 PST 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 3bb969f3ebb25037e8eb69c30a5a0dfb5d9d0f51
https://github.com/llvm/llvm-project/commit/3bb969f3ebb25037e8eb69c30a5a0dfb5d9d0f51
Author: Slava Zakharin <szakharin at nvidia.com>
Date: 2025-01-15 (Wed, 15 Jan 2025)
Changed paths:
M flang/include/flang/Optimizer/Builder/FIRBuilder.h
M flang/include/flang/Optimizer/Builder/HLFIRTools.h
M flang/include/flang/Optimizer/HLFIR/Passes.td
M flang/lib/Optimizer/Builder/FIRBuilder.cpp
M flang/lib/Optimizer/Builder/HLFIRTools.cpp
M flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
M flang/lib/Optimizer/Passes/Pipelines.cpp
M flang/test/Driver/mlir-pass-pipeline.f90
M flang/test/Fir/basic-program.fir
A flang/test/HLFIR/simplify-hlfir-intrinsics-matmul.fir
Log Message:
-----------
[flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
* Get rid of any overhead related to calling runtime MATMUL
(such as descriptors creation).
* Use CPU-specific vectorization cost model for matmul loops,
which Fortran runtime cannot currently do.
* Optimize matmul of known-size arrays by complete unrolling.
One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.
Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
under an enigneering option for experiments.
At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental`
seems to be a good approach, e.g. it allows getting rid of a temporay
array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`.
This patch improves performance of galgel and tonto a little bit.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list