[Mlir-commits] [mlir] [MLIR][GPU] Support synchronous gpu.alloc and gpu.dealloc in gpu-to-llvm (PR #191661)

Sat Apr 11 13:11:32 PDT 2026

llvmbot wrote:




@llvm/pr-subscribers-mlir

Author: Jared Hoberock (jaredhoberock)

<details>
<summary>Changes</summary>

The gpu-to-llvm conversion patterns for gpu.alloc and gpu.dealloc previously required async tokens for non-host-shared operations. This made synchronous allocation & deallocation practically impossible because even though the dialect operations allow it, the lowering didn't.

Remove the async requirement:
- gpu.alloc: remove the isAsyncWithOneDependency guard for non-shared allocs. The existing code already handles the sync case correctly (null stream when no async dependencies).
- gpu.dealloc: remove the async requirement entirely. Use a null stream when no async dependencies are present. Use eraseOp instead of replaceOp for sync deallocs (which have no results).

Also add a new unit test to test the synchronous alloc/dealloc lowering.

Assisted by Claude (Anthropic)

---
Full diff: https://github.com/llvm/llvm-project/pull/191661.diff


2 Files Affected:

- (modified) mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp (+12-6) 
- (modified) mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir (+16) 


``````````diff

diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 3e99c537d0e02..7f6079c6c1da5 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -770,8 +770,6 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
   if (isShared && allocOp.getAsyncToken())
     return rewriter.notifyMatchFailure(
         allocOp, "Host Shared allocation cannot be done async");
-  if (!isShared && failed(isAsyncWithOneDependency(rewriter, allocOp)))
-    return failure();
 
   // Get shape of the memref as values: static sizes are constant
   // values and dynamic sizes are passed to 'alloc' as operands.
@@ -815,18 +813,26 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
 LogicalResult ConvertDeallocOpToGpuRuntimeCallPattern::matchAndRewrite(
     gpu::DeallocOp deallocOp, OpAdaptor adaptor,
     ConversionPatternRewriter &rewriter) const {
-  if (failed(areAllLLVMTypes(deallocOp, adaptor.getOperands(), rewriter)) ||
-      failed(isAsyncWithOneDependency(rewriter, deallocOp)))
+  if (failed(areAllLLVMTypes(deallocOp, adaptor.getOperands(), rewriter)))
     return failure();
 
   Location loc = deallocOp.getLoc();
 
   Value pointer =
       MemRefDescriptor(adaptor.getMemref()).allocatedPtr(rewriter, loc);
-  Value stream = adaptor.getAsyncDependencies().front();
+  auto nullPtr = mlir::LLVM::ZeroOp::create(rewriter, loc, llvmPointerType);
+  Value stream = adaptor.getAsyncDependencies().empty()
+                     ? nullPtr
+                     : adaptor.getAsyncDependencies().front();
   deallocCallBuilder.create(loc, rewriter, {pointer, stream});
 
-  rewriter.replaceOp(deallocOp, {stream});
+  if (deallocOp.getAsyncToken()) {
+    // Async dealloc: propagate the stream as the async token replacement.
+    rewriter.replaceOp(deallocOp, {stream});
+  } else {
+    // Sync dealloc: no results to replace, just remove the op.
+    rewriter.eraseOp(deallocOp);
+  }
   return success();
 }
 
diff --git a/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir b/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir
index ae8b7aaac7fd9..c2fda0ac90031 100644
--- a/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir
+++ b/mlir/test/Conversion/GPUCommon/lower-alloc-to-gpu-runtime-calls.mlir
@@ -20,6 +20,22 @@ module attributes {gpu.container_module} {
     return
   }
 
+  // CHECK-LABEL: llvm.func @alloc_dealloc_sync
+  // CHECK-SAME: %[[size:.*]]: i64
+  func.func @alloc_dealloc_sync(%size : index) {
+    // CHECK: %[[gep:.*]] = llvm.getelementptr {{.*}}[%[[size]]]
+    // CHECK: %[[size_bytes:.*]] = llvm.ptrtoint %[[gep]]
+    // CHECK: %[[nullptr:.*]] = llvm.mlir.zero
+    // CHECK: %[[isHostShared:.*]] = llvm.mlir.constant
+    // CHECK: llvm.call @mgpuMemAlloc(%[[size_bytes]], %[[nullptr]], %[[isHostShared]])
+    %0 = gpu.alloc (%size) : memref<?xf32>
+    // CHECK: %[[float_ptr:.*]] = llvm.extractvalue {{.*}}[0]
+    // CHECK: %[[nullptr2:.*]] = llvm.mlir.zero
+    // CHECK: llvm.call @mgpuMemFree(%[[float_ptr]], %[[nullptr2]])
+    gpu.dealloc %0 : memref<?xf32>
+    return
+  }
+
   // CHECK-LABEL: llvm.func @alloc_sync
   // CHECK-SAME: %[[size:.*]]: i64
   func.func @alloc_sync(%size : index) {

``````````

</details>


https://github.com/llvm/llvm-project/pull/191661