[Mlir-commits] [mlir] [mlir][nvgpu] Mark TMA descriptor as MemWriteAt in `tma.async.store` (PR #79427)

Tue Jan 30 09:53:09 PST 2024

joker-eph wrote:

Also: the model of having "nvgpu.tma.create.descriptor" doing both the creation of the descriptor **and** the memcpy to the device will prevent from adopting the grid-constant method and so we won't be able to take advantage of the perf gain.
We likely should revamp this to align more with how it works in Cuda?

https://github.com/llvm/llvm-project/pull/79427