[Mlir-commits] [mlir] [MLIR][NVVM][NVGPU] Combine prefetch and prefetch.tensormap (PR #153134)

Thu Aug 21 02:19:56 PDT 2025

================
@@ -2427,15 +2428,26 @@ def PrefetchCacheLevelAttr : EnumAttr<NVVM_Dialect, PrefetchCacheLevel, "prefetc
   let assemblyFormat = "$value";
 }
 
-def NVVM_PrefetchOp : NVVM_Op<"prefetch"> {
+def NVVM_PrefetchOp : NVVM_Op<"prefetch",
+    [DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]> {
   let summary = "Brings the cache line containing an address into the specified cache level";
   let description = [{
-    Operand `addr` can be a global, local or generic address pointer. No 
-    operation is performed if `addr` maps to a `shared` memory location.
+    Operand `addr` can be a global, local, or generic address pointer. If 
+    `tensormap` is specified, `addr` can be a constant or generic address 
+    pointer.
+    No operation is performed if `addr` maps to a `shared` memory location.
+
+    The `cacheLevel` attribute is optional and specifies the cache level to 
+    which the cache line containing the specified address is brought.
+
+    `tensormap`can be specified instead of `cacheLevel` to bring the cache line 
+    containing the specified address in the [const](https://docs.nvidia.com/cuda/parallel-thread-execution/#constant-state-space) or [param](https://docs.nvidia.com/cuda/parallel-thread-execution/#parameter-state-space) state spaces for 
+    subsequent use by `the cp.async.bulk.tensor` instruction.
+
+    `in_param_space` can be specified with `tensormap` to indicate that the 
+    given generic address maps to the `param` state space. If `in_param_space` 
+    is specified, `addr` must be a generic address pointer.
----------------
grypp wrote:

```suggestion
Prefetches the cache line containing the address given by `addr`. The operand may be a global, local, or generic pointer. When `tensormap` is specified, the operand may instead be a constant or generic pointer. If the address maps to shared memory, the operation has no effect.

At most one of `cacheLevel` or `tensormap` may be present. The `cacheLevel` attribute selects the target cache level. When combined with `uniform`, the prefetch is performed to the uniform cache, in which case `addr` must be a generic pointer.

When `tensormap` is used, the line containing `addr` is brought into the constant or parameter state space for later use by `cp.async.bulk.tensor`. If `in_param_space` is specified, the generic pointer is interpreted as referring to the parameter state space.
```

https://github.com/llvm/llvm-project/pull/153134