[llvm] [NVPTX] Add Bulk Copy Prefetch Intrinsics (PR #123226)

Tue Jan 21 14:50:25 PST 2025

================
@@ -553,6 +553,34 @@ it must be a multiple of 16.
 For more information, refer PTX ISA
 `<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk>`_.
 
+'``llvm.nvvm.cp.async.bulk.prefetch.L2``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void @llvm.nvvm.cp.async.bulk.prefetch.L2(ptr addrspace(1) %src, i32 %size, i64 %ch, i1 %flag_ch)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.cp.async.bulk.prefetch.L2``' intrinsic
+corresponds to the ``cp.async.bulk.prefetch.L2.*`` family
+of PTX instructions. These instructions initiate an asynchronous
+prefetch of bulk data from global memory to the L2 cache.
+The 32-bit operand ``%size`` specifies the amount of memory to be
+prefetched in terms of bytes and it must be a multiple of 16.
+
+* The last argument to these intrinsics is boolean flag indicating
+  support for cache_hint. These flag argument must be compile-time
+  constant. When set, it indicates a valid cache_hint (``i64 %ch``)
+  and generates the ``.L2::cache_hint`` variant of the PTX instruction.
----------------
Artem-B wrote:

Hmm.. After looking at the PTX spec, I have questions.

Spec says:
```
When the optional argument cache-policy is specified, the qualifier .level::cache_hint is required. The 64-bit operand cache-policy specifies the cache eviction policy that may be used during the memory access.
```

It appears that `%flag_ch` is, effectively, not a user-specified parameter and is completely dependent on whether `%ch` is specified. In other words, it's the cache policy (AKA `%ch`) that's the input. `%flag_ch` is only there to tell LLVM to add appropriate instruction suffix. IMO it's a rather awkward way to do so. To me it looks like we have two instruction variants with different number of parameters, and it should be handled either by making `%ch` argument optional and automatically deriving `.L2::cache_hint` from whether it's present, or have two explicit intrinsics:

```
declare void @llvm.nvvm.cp.async.bulk.prefetch.L2(ptr addrspace(1) %src, i32 %size)
declare void @llvm.nvvm.cp.async.bulk.prefetch.L2.cache_hint(ptr addrspace(1) %src, i32 %size, i64 %ch)
```

Considering that we already have this cache hint flag present in other intrinsics, it should probably be handled in a separate patch.

This patch is OK to proceed as is.

https://github.com/llvm/llvm-project/pull/123226