[Mlir-commits] [mlir] [mlir][nvgpu] Mark TMA descriptor as MemWriteAt in `tma.async.store` (PR #79427)

Tue Jan 30 10:35:43 PST 2024

grypp wrote:

> Also: the model of having "nvgpu.tma.create.descriptor" doing both the creation of the descriptor **and** the memcpy to the device will prevent from adopting the grid-constant method and so we won't be able to take advantage of the perf gain.
> 
> We likely should revamp this to align more with how it works in Cuda?

I totally agree with that. I've implemented this as we don't have grid_constant. Let me come up with a follow up PR. 

Also, implicit memcpy could be a leak anyway if you don't do cudaFree. 

Thanks for bringing up this to my attention. 

https://github.com/llvm/llvm-project/pull/79427