[all-commits] [llvm/llvm-project] edf5ca: [mlir][gpu] Support Cluster of Thread Blocks in `g...
Guray Ozen via All-commits
all-commits at lists.llvm.org
Mon Nov 27 02:05:21 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: edf5cae7391cdb097a090ea142dfa7ac6ac03555
https://github.com/llvm/llvm-project/commit/edf5cae7391cdb097a090ea142dfa7ac6ac03555
Author: Guray Ozen <guray.ozen at gmail.com>
Date: 2023-11-27 (Mon, 27 Nov 2023)
Changed paths:
M mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
M mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
M mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
M mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
M mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
M mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
M mlir/lib/Target/LLVMIR/Dialect/GPU/SelectObjectAttr.cpp
M mlir/test/Conversion/GPUCommon/lower-launch-func-to-gpu-runtime-calls.mlir
M mlir/test/Dialect/GPU/invalid.mlir
M mlir/test/Dialect/GPU/ops.mlir
A mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
M mlir/test/Target/LLVMIR/gpu.mlir
Log Message:
-----------
[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.
This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.
The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.
An example of this implementation can be seen below:
```
gpu.launch_func @kernel_module::@kernel
clusters in (%1, %0, %0) // <-- Optional
blocks in (%0, %0, %0)
threads in (%0, %0, %0)
```
The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:
```
%cidX = gpu.cluster_id x
%cidY = gpu.cluster_id y
%cidZ = gpu.cluster_id z
%cdimX = gpu.cluster_dim x
%cdimY = gpu.cluster_dim y
%cdimZ = gpu.cluster_dim z
```
We will introduce cluster support in `gpu.launch` Op in an upcoming PR.
See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
More information about the All-commits
mailing list