[PATCH] D151997: [AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC

Thu Jun 8 01:02:58 PDT 2023

rovka marked an inline comment as done.
rovka added inline comments.

================
Comment at: llvm/docs/AMDGPUUsage.rst:1113-1118
+                                     However, we add waits for errata / hardware workarounds in the epilog:
+
+                                     * On gfx11+, the function epilog waits for any scratch stores to be confirmed. This
+                                       works around the issue that we must wait for scratch stores before sending a
+                                       ``MSG_DEALLOC_VGPRS`` message.
+                                     * Additional waits may be required (e.g. ``s_waitcnt_depctr``).
----------------
nhaehnle wrote:
> foad wrote:
> > I'm not sure any of this belongs in a calling convention description. The dealloc_vgprs thing applies to all kernels regardless of calling convention, but probably doesn't need to be documented to the end user anyway.
> It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism. @t-tye may have opinions on this as well.
Ok, I'll wait for more feedback on this.

================
Comment at: llvm/docs/AMDGPUUsage.rst:1120-1121
+
+                                     Functions with this calling convention cannot be called directly. They must
+                                     instead be launched via the ``llvm.amdgcn.cs.chain`` intrinsic.
+
----------------
foad wrote:
> So they're not launched by hardware? That feels like the biggest difference from amdgpu_cs, and should probably be mentioned first.
Good point, fixed.

================
Comment at: llvm/docs/AMDGPUUsage.rst:1127
+
+                                     Functions must be aligned to at least 64 bytes.
+
----------------
nhaehnle wrote:
> arsenm wrote:
> > nhaehnle wrote:
> > > This shouldn't be a property of the calling convention. We should just set the alignment attribute in the frontend to ensure this.
> > Why? Entry points require 256 byte align but regular code is fine with 4
> Entry points require 256 for HW reasons. The 64 bytes we discussed is a SW choice that is orthogonal to the base functionality of amdgpu_cs_chain (making LSBs of function pointers available to stuff metadata in -- which we also could have considered doing with amdgpu_gfx functions).
I agree this might not be the most pertinent place to put it, but it feels like relevant information, like you said above. I don't feel very strongly about this, we could just omit it (or mention it elsewhere if you think there's a better place for it).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151997/new/

https://reviews.llvm.org/D151997