[all-commits] [llvm/llvm-project] 9774cd: [mlir][nvgpu] Fix affine maps computing indices fo...
Manish Gupta via All-commits
all-commits at lists.llvm.org
Thu Dec 1 18:31:03 PST 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 9774cd17e80fc413cef73e1e7e9bac20ef21ebae
https://github.com/llvm/llvm-project/commit/9774cd17e80fc413cef73e1e7e9bac20ef21ebae
Author: Manish Gupta <manigupta at google.com>
Date: 2022-12-01 (Thu, 01 Dec 2022)
Changed paths:
M mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp
M mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir
Log Message:
-----------
[mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref
This patch fixes and simplifies the ldmatrix affine map arithmetic by
abstracting the affine expressions in terms of pitch-linear layout
(strided and contiguous dimensions). Then it applies the maps for
strided and contiguous dimensions in row-major and col-major.
LdMatrixOp collaboratively (32 threads in a warp) load tiles
(8 row x 128b col) of data. It can load either x1, x2, x4 tiles.
Additionally, it can transpose at 16-bit granularity when moving
data from the Shared Memory to registers.
This patch fixes affine map:
(laneid -> coordinate index a thread points in a tile).
- Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous
chunk of 128b. The issue was exposed when running this case.
- Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points
to contiguous chunk of 128b. The patch is NFC for these cases.
Differential Revision: https://reviews.llvm.org/D138978
More information about the All-commits
mailing list