[Mlir-commits] [mlir] [MLIR][NVVM][NVGPU] Combine prefetch and prefetch.tensormap (PR #153134)
Guray Ozen
llvmlistbot at llvm.org
Thu Aug 21 02:19:56 PDT 2025
================
@@ -2427,15 +2428,26 @@ def PrefetchCacheLevelAttr : EnumAttr<NVVM_Dialect, PrefetchCacheLevel, "prefetc
let assemblyFormat = "$value";
}
-def NVVM_PrefetchOp : NVVM_Op<"prefetch"> {
+def NVVM_PrefetchOp : NVVM_Op<"prefetch",
+ [DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]> {
let summary = "Brings the cache line containing an address into the specified cache level";
let description = [{
- Operand `addr` can be a global, local or generic address pointer. No
- operation is performed if `addr` maps to a `shared` memory location.
+ Operand `addr` can be a global, local, or generic address pointer. If
+ `tensormap` is specified, `addr` can be a constant or generic address
+ pointer.
+ No operation is performed if `addr` maps to a `shared` memory location.
+
+ The `cacheLevel` attribute is optional and specifies the cache level to
+ which the cache line containing the specified address is brought.
+
+ `tensormap`can be specified instead of `cacheLevel` to bring the cache line
+ containing the specified address in the [const](https://docs.nvidia.com/cuda/parallel-thread-execution/#constant-state-space) or [param](https://docs.nvidia.com/cuda/parallel-thread-execution/#parameter-state-space) state spaces for
+ subsequent use by `the cp.async.bulk.tensor` instruction.
+
+ `in_param_space` can be specified with `tensormap` to indicate that the
+ given generic address maps to the `param` state space. If `in_param_space`
+ is specified, `addr` must be a generic address pointer.
----------------
grypp wrote:
```suggestion
Prefetches the cache line containing the address given by `addr`. The operand may be a global, local, or generic pointer. When `tensormap` is specified, the operand may instead be a constant or generic pointer. If the address maps to shared memory, the operation has no effect.
At most one of `cacheLevel` or `tensormap` may be present. The `cacheLevel` attribute selects the target cache level. When combined with `uniform`, the prefetch is performed to the uniform cache, in which case `addr` must be a generic pointer.
When `tensormap` is used, the line containing `addr` is brought into the constant or parameter state space for later use by `cp.async.bulk.tensor`. If `in_param_space` is specified, the generic pointer is interpreted as referring to the parameter state space.
```
https://github.com/llvm/llvm-project/pull/153134
More information about the Mlir-commits
mailing list