[llvm] [NVPTX] Add Intrinsics for discard.* (PR #128404)

via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 28 02:58:31 PST 2025


================
@@ -671,6 +671,43 @@ level on which the priority is to be applied. The only supported value for the s
 For more information, refer to the PTX ISA
 `<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-applypriority>`_.
 
+``llvm.nvvm.discard.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void  @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 immarg)
+  declare void  @llvm.nvvm.discard.L2(ptr %ptr, i64 immarg)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.discard.*``' semantically behaves like a weak write of an *unstable indeterminate value*: 
+reads of memory locations with *unstable indeterminate values* may return different 
+bit patterns each time until the memory is overwritten.
+This operation *hints* to the implementation that data in the specified cache ``.level`` 
+can be destructively discarded without writing it back to memory. The operand ``size`` is an 
+integer constant that specifies the length in bytes of the address range ``[a, a + size)`` to write 
+*unstable indeterminate values* into. The only supported value for the ``size`` operand is ``128``. 
+If no state space is specified then `generic-addressing` is used. If the specified address does 
+not fall within the address window of ``.global`` state space then the behavior is undefined.
+
+LLVM does not define anywhere what an *unstable indeterminate values* is, and the closest concept 
+LLVM has breaks the example below:
+
+.. code-block:: text
+  
+  discard.global.L2 [ptr], 128;
+  ld.weak.u32 r0, [ptr];
+  ld.weak.u32 r1, [ptr];
+  // The values in r0 and r1 may differ!
----------------
gonzalobg wrote:

```suggestion
The *effects* of the ``@llvm.nvvm.discard.*`` intrinsics are those of a non-atomic non-volatile ``llvm.memset`` that writes ``undef`` to the destination address range ``[%ptr, %ptr + immarg)``. 
Subsequent reads from the address range may read ``undef`` until the memory is overwritten with a different value.
These operations *hint* the implementation that data in the L2 cache can be destructively discarded without writing it back to memory. 
The operand ``immarg`` is an integer constant that specifies the length in bytes of the address range ``[%ptr, %ptr + immarg)`` to write ``undef`` into. 
The only supported value for the ``immarg`` operand is ``128``. 
If generic addressing is used and the specified address does not fall within the address window of ``addrspace(1)`` the behavior is undefined.

.. code-block:: llvm
 
   call void @llvm.nvvm.discard.L2(ptr %p, i64 8) ;; writes `undef` to [p, p+8)
   %a = load i64, ptr %p  ;; loads undef
   %b = load i64, ptr %p ;; loads undef
   ;; comparing %a and %b compares undef values!
   %fa = freeze i64 %a ;; freezes undef to stable bitpatter
   %fb = freeze i64 %b ;; freezes undef to stable bitpattern
   ;; %fa may compare different to %fb!
   
For more information, refer to the  `CUDA C++ discard documentation <https://nvidia.github.io/cccl/libcudacxx/extended_api/memory_access_properties/discard_memory.html>`__ and the `PTX ISA discard documentation <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard>`__ .
```

https://github.com/llvm/llvm-project/pull/128404


More information about the llvm-commits mailing list