[PATCH] D151997: [AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC

Diana Picus via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 2 06:45:59 PDT 2023


rovka created this revision.
rovka added reviewers: AMDGPU, nhaehnle, ruiling.
Herald added subscribers: kerbowa, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
rovka requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.

Co-authored-by: Nicolai Hähnle <nicolai.haehnle at amd.com>


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D151997

Files:
  llvm/docs/AMDGPUUsage.rst


Index: llvm/docs/AMDGPUUsage.rst
===================================================================
--- llvm/docs/AMDGPUUsage.rst
+++ llvm/docs/AMDGPUUsage.rst
@@ -1085,6 +1085,49 @@
                                      ..TODO::
                                      Describe.
 
+     ``amdgpu_cs_chain``             Similar to ``amdgpu_cs``, with differences described below.
+
+                                     Arguments are passed in SGPRs if they have the ``inreg`` attribute and
+                                     in VGPRs otherwise, starting at v8. Using more SGPRs or VGPRs than
+                                     available in the subtarget is not allowed.  On subtargets that use
+                                     a scratch buffer descriptor (as opposed to ``scratch_{load,store}_*`` instructions),
+                                     the scratch buffer descriptor is passed in s[48:51]. This limits the
+                                     SGPR / ``inreg`` arguments to the equivalent of 48 dwords; using more
+                                     than that is not allowed.
+
+                                     The return type must be void.
+                                     Varargs, sret, byval, byref, inalloca, preallocated are not supported.
+
+                                     Values in scalar registers as well as v0-v7 are not preserved. Values in
+                                     VGPRs starting at v8 are not preserved for the active lanes, but must be
+                                     saved by the callee for inactive lanes when using WWM.
+
+                                     Wave scratch is "empty" at function boundaries. There is no stack pointer input
+                                     or output value, but functions are free to use scratch starting from an initial
+                                     stack pointer. Calls to ``amdgpu_gfx`` functions are allowed and behave like they
+                                     do in ``amdgpu_cs`` functions.
+
+                                     All counters (``lgkmcnt``, ``vmcnt``, ``storecnt``, etc.) are presumed in an
+                                     unknown state at function entry. Waits for regular memory counters are not
+                                     inserted as part of an ``llvm.amdgcn.cs.chain`` sequence in the function epilog.
+                                     However, we add waits for errata / hardware workarounds in the epilog:
+
+                                     * On gfx11+, the function epilog waits for any scratch stores to be confirmed. This
+                                       works around the issue that we must wait for scratch stores before sending a
+                                       ``MSG_DEALLOC_VGPRS`` message.
+                                     * Additional waits may be required (e.g. ``s_waitcnt_depctr``).
+
+                                     Functions with this calling convention cannot be called directly. They must
+                                     instead be launched via the ``llvm.amdgcn.cs.chain`` intrinsic.
+
+                                     A function may have multiple exits (e.g. one chain exit and one plain ``ret void``
+                                     for when the wave ends), but all ``llvm.amdgcn.cs.chain`` exits must be in
+                                     uniform control flow.
+
+                                     Functions must be aligned to at least 64 bytes.
+
+     ``amdgpu_cs_chain_preserve``    Same as ``amdgpu_cs_chain``, but active lanes for VGPRs starting at v8 are preserved.
+
      =============================== ==========================================================
 
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D151997.527847.patch
Type: text/x-patch
Size: 3728 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230602/856aafab/attachment.bin>


More information about the llvm-commits mailing list