[Mlir-commits] [mlir] [MLIR] Modify lowering of gpu.alloc op to llvm (PR #69969)

Tue Oct 24 06:35:37 PDT 2023

================
@@ -836,7 +836,11 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
   // Allocate the underlying buffer and store a pointer to it in the MemRef
   // descriptor.
   Type elementPtrType = this->getElementPtrType(memRefType);
-  auto stream = adaptor.getAsyncDependencies().front();
+
+  Value stream =
+      adaptor.getAsyncDependencies().empty()
----------------
grypp wrote:

I'm a bit unclear of creating stream. If the `stream` isn't available, my understanding is that we would prefer synchronous execution. In such a scenario, we wouldn't need the `stream`. At least in CUDA model, you don't have to do that, the cuda driver/runtime can use existing default stream implicitly. 

Can we pass `nullptr` for stream when it's not available? Alternatively, we might create another API in the runtimes. 

Afaik, creating and synchronizing stream is not free. We better avoid if it is not necessary. 

For example, sycl runtime (below) that doesn't utilize the 'stream' (aka `sycl::queue *queue`). 

```
static void *allocDeviceMemory(sycl::queue *queue, size_t size, bool isShared) {
  void *memPtr = nullptr;
  if (isShared) {
    memPtr = sycl::aligned_alloc_shared(64, size, getDefaultDevice(), getDefaultContext());
  } else {
    memPtr = sycl::aligned_alloc_device(64, size, getDefaultDevice(), getDefaultContext());
  }
  if (memPtr == nullptr) {
    throw std::runtime_error("mem allocation failed!");
  }
  return memPtr;
}

mgpuMemAlloc(uint64_t size, sycl::queue *queue, bool isShared) {
  return catchAll([&]() {
    return allocDeviceMemory(queue, static_cast<size_t>(size), true);
  });
}
```

https://github.com/llvm/llvm-project/pull/69969