[llvm] [NVPTX] Add tcgen05 alloc/dealloc intrinsics (PR #124961)

Durgadoss R via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 30 02:31:31 PST 2025


================
@@ -962,6 +962,109 @@ The ``griddepcontrol`` intrinsics allows the dependent grids and prerequisite gr
 For more information, refer 
 `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__.
 
+TCGEN05 family of Intrinsics
+----------------------------
+
+The llvm.nvvm.tcgen05.* intrinsics model the TCGEN05 family of instructions
+exposed by PTX. These intrinsics use 'Tensor Memory' (henceforth ``tmem``).
+NVPTX represents this memory using ``addrspace(6)`` and is always 32-bits.
+
+For more information, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory>`_.
+
+The tensor-memory pointers may only be used with the tcgen05 intrinsics.
+There are specialized load/store instructions provided (tcgen05.ld/st) to
+work with tensor-memory.
+
+For more information on tensor-memory load/store instructions, refer
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-and-register-load-store-instructions>`_.
+
+All tcgen05 intrinsics use a ``null`` pointer in tmem address
+space as their last operand. This helps to preserve ordering among the tcgen05
+operations especially when the intrinsic lacks any tmem operands. This
+last operand is dropped during Codegen.
----------------
durga4github wrote:

ah no. I will re-phrase the docs a bit.

There are two things here:
1)
The tcgen05 intrinsics work on Tensor Memory operands. These are represented by the tmem pointer, with AS(6).
This is the general regular use case. The 'tmem_addr' operand in the dealloc intrinsic is an example.

2)
A few tcgen05 intrinsics "do not take any tmem operand" but interact only with other tcgen05 intrinsics. For them, to model this interaction, we use the null-ptr in tmem as their last operand, with the ArgMemOnly property. The 'relinq_alloc_permit' intrinsic is an example of this case. I believe we have a few choices here:
A)
Attach HasSideEffects to these few intrinsics. (Most conservative).
B)
Attach null-ptr(6) as a token, but only to these intrinsics (but not to all tcgen05 intrinsics). 
C)
Attach null-ptr(6) as a token to all tcgen05 intrinsics. We chose this since it makes all tcgen05 intrinsics look consistent.

In both B) and C), the token operand is added apart from the regular operands and is ignored/dropped at Codegen.






https://github.com/llvm/llvm-project/pull/124961


More information about the llvm-commits mailing list