[Mlir-commits] [mlir] [MLIR][GPU] Support synchronous gpu.alloc and gpu.dealloc in gpu-to-llvm (PR #191661)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Sat Apr 11 13:11:32 PDT 2026
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir
Author: Jared Hoberock (jaredhoberock)
<details>
<summary>Changes</summary>
The gpu-to-llvm conversion patterns for gpu.alloc and gpu.dealloc previously required async tokens for non-host-shared operations. This made synchronous allocation & deallocation practically impossible because even though the dialect operations allow it, the lowering didn't.
Remove the async requirement:
- gpu.alloc: remove the isAsyncWithOneDependency guard for non-shared allocs. The existing code already handles the sync case correctly (null stream when no async dependencies).
- gpu.dealloc: remove the async requirement entirely. Use a null stream when no async dependencies are present. Use eraseOp instead of replaceOp for sync deallocs (which have no results).
Also add a new unit test to test the synchronous alloc/dealloc lowering.
Assisted by Claude (Anthropic)
---
Full diff: https://github.com/llvm/llvm-project/pull/191661.diff
2 Files Affected:
- (modified) mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp (+12-6)
- (modified) mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir (+16)
``````````diff
diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 3e99c537d0e02..7f6079c6c1da5 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -770,8 +770,6 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
if (isShared && allocOp.getAsyncToken())
return rewriter.notifyMatchFailure(
allocOp, "Host Shared allocation cannot be done async");
- if (!isShared && failed(isAsyncWithOneDependency(rewriter, allocOp)))
- return failure();
// Get shape of the memref as values: static sizes are constant
// values and dynamic sizes are passed to 'alloc' as operands.
@@ -815,18 +813,26 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
LogicalResult ConvertDeallocOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::DeallocOp deallocOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {
- if (failed(areAllLLVMTypes(deallocOp, adaptor.getOperands(), rewriter)) ||
- failed(isAsyncWithOneDependency(rewriter, deallocOp)))
+ if (failed(areAllLLVMTypes(deallocOp, adaptor.getOperands(), rewriter)))
return failure();
Location loc = deallocOp.getLoc();
Value pointer =
MemRefDescriptor(adaptor.getMemref()).allocatedPtr(rewriter, loc);
- Value stream = adaptor.getAsyncDependencies().front();
+ auto nullPtr = mlir::LLVM::ZeroOp::create(rewriter, loc, llvmPointerType);
+ Value stream = adaptor.getAsyncDependencies().empty()
+ ? nullPtr
+ : adaptor.getAsyncDependencies().front();
deallocCallBuilder.create(loc, rewriter, {pointer, stream});
- rewriter.replaceOp(deallocOp, {stream});
+ if (deallocOp.getAsyncToken()) {
+ // Async dealloc: propagate the stream as the async token replacement.
+ rewriter.replaceOp(deallocOp, {stream});
+ } else {
+ // Sync dealloc: no results to replace, just remove the op.
+ rewriter.eraseOp(deallocOp);
+ }
return success();
}
diff --git a/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir b/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir
index ae8b7aaac7fd9..c2fda0ac90031 100644
--- a/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir
+++ b/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir
@@ -20,6 +20,22 @@ module attributes {gpu.container_module} {
return
}
+ // CHECK-LABEL: llvm.func @alloc_dealloc_sync
+ // CHECK-SAME: %[[size:.*]]: i64
+ func.func @alloc_dealloc_sync(%size : index) {
+ // CHECK: %[[gep:.*]] = llvm.getelementptr {{.*}}[%[[size]]]
+ // CHECK: %[[size_bytes:.*]] = llvm.ptrtoint %[[gep]]
+ // CHECK: %[[nullptr:.*]] = llvm.mlir.zero
+ // CHECK: %[[isHostShared:.*]] = llvm.mlir.constant
+ // CHECK: llvm.call @mgpuMemAlloc(%[[size_bytes]], %[[nullptr]], %[[isHostShared]])
+ %0 = gpu.alloc (%size) : memref<?xf32>
+ // CHECK: %[[float_ptr:.*]] = llvm.extractvalue {{.*}}[0]
+ // CHECK: %[[nullptr2:.*]] = llvm.mlir.zero
+ // CHECK: llvm.call @mgpuMemFree(%[[float_ptr]], %[[nullptr2]])
+ gpu.dealloc %0 : memref<?xf32>
+ return
+ }
+
// CHECK-LABEL: llvm.func @alloc_sync
// CHECK-SAME: %[[size:.*]]: i64
func.func @alloc_sync(%size : index) {
``````````
</details>
https://github.com/llvm/llvm-project/pull/191661
More information about the Mlir-commits
mailing list