[Mlir-commits] [mlir] [mlir][nvgpu] Mark TMA descriptor as MemWriteAt in `tma.async.store` (PR #79427)
Guray Ozen
llvmlistbot at llvm.org
Tue Jan 30 10:35:43 PST 2024
grypp wrote:
> Also: the model of having "nvgpu.tma.create.descriptor" doing both the creation of the descriptor **and** the memcpy to the device will prevent from adopting the grid-constant method and so we won't be able to take advantage of the perf gain.
>
> We likely should revamp this to align more with how it works in Cuda?
I totally agree with that. I've implemented this as we don't have grid_constant. Let me come up with a follow up PR.
Also, implicit memcpy could be a leak anyway if you don't do cudaFree.
Thanks for bringing up this to my attention.
https://github.com/llvm/llvm-project/pull/79427
More information about the Mlir-commits
mailing list