[all-commits] [llvm/llvm-project] ea8489: [mlir][gpu] Introduce `gpu.dynamic_shared_memory` ...
Guray Ozen via All-commits
all-commits at lists.llvm.org
Thu Nov 16 05:42:32 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: ea84897ba3e7727a3aa3fbd6d84b6b4ab573c70d
https://github.com/llvm/llvm-project/commit/ea84897ba3e7727a3aa3fbd6d84b6b4ab573c70d
Author: Guray Ozen <guray.ozen at gmail.com>
Date: 2023-11-16 (Thu, 16 Nov 2023)
Changed paths:
M mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
M mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
M mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h
M mlir/include/mlir/IR/SymbolTable.h
M mlir/lib/Conversion/GPUCommon/GPUOpsLowering.cpp
M mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h
M mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
M mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
M mlir/lib/IR/SymbolTable.cpp
A mlir/test/Dialect/GPU/dynamic-shared-memory.mlir
M mlir/test/Dialect/GPU/invalid.mlir
Log Message:
-----------
[mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (#71546)
While the `gpu.launch` Op allows setting the size via the
`dynamic_shared_memory_size` argument, accessing the dynamic shared
memory is very convoluted. This PR implements the proposed Op,
`gpu.dynamic_shared_memory` that aims to simplify the utilization of
dynamic shared memory.
RFC:
https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/
**Proposal from RFC**
This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory
feature efficiently. It is is a powerful feature that enables the
allocation of shared memory at runtime with the kernel launch on the
host. Afterwards, the memory can be accessed directly from the device. I
believe similar story exists for AMDGPU.
**Current way Using Dynamic Shared Memory with MLIR**
Let me illustrate the challenges of using dynamic shared memory in MLIR
with an example below. The process involves several steps:
- memref.global 0-sized array LLVM's NVPTX backend expects
- dynamic_shared_memory_size Set the size of dynamic shared memory
- memref.get_global Access the global symbol
- reinterpret_cast and subview Many OPs for pointer arithmetic
```
// Step 1. Create 0-sized global symbol. Manually set the alignment
memref.global "private" @dynamicShmem : memref<0xf16, 3> { alignment = 16 }
func.func @main() {
// Step 2. Allocate shared memory
gpu.launch blocks(...) threads(...)
dynamic_shared_memory_size %c10000 {
// Step 3. Access the global object
%shmem = memref.get_global @dynamicShmem : memref<0xf16, 3>
// Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations.
%4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128], strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3>
%5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3>
%6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3>
%7 = memref.subview %6[0, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>
%8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>
// Step.5 Use
"test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index)
"test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index)
gpu.terminator
}
```
Let’s write the program above with that:
```
func.func @main() {
gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 {
%i = arith.constant 18 : index
// Step 1: Obtain shared memory directly
%shmem = gpu.dynamic_shared_memory : memref<?xi8, 3>
%c147456 = arith.constant 147456 : index
%c155648 = arith.constant 155648 : index
%7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3>
%8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3>
// Step 2: Utilize the shared memory
"test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index)
"test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index)
}
}
```
This PR resolves #72513
More information about the All-commits
mailing list