[Mlir-commits] [mlir] [mlir][nvgpu] Mark TMA descriptor as MemWriteAt in `tma.async.store` (PR #79427)
Mehdi Amini
llvmlistbot at llvm.org
Tue Jan 30 09:53:09 PST 2024
joker-eph wrote:
Also: the model of having "nvgpu.tma.create.descriptor" doing both the creation of the descriptor **and** the memcpy to the device will prevent from adopting the grid-constant method and so we won't be able to take advantage of the perf gain.
We likely should revamp this to align more with how it works in Cuda?
https://github.com/llvm/llvm-project/pull/79427
More information about the Mlir-commits
mailing list