[PATCH] D75138: [WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Tony Tye via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 10 20:14:15 PDT 2020


t-tye added inline comments.


================
Comment at: llvm/docs/AMDGPUUsage.rst:8557-8559
+        and the register count is determined ignoring it. This can be done
+        without having to perform register allocation again, which is necessary
+        as register allocation may introduce spills.
----------------
"This can be done without having to perform register allocation again, which is necessary as register allocation may introduce spills." Suggest moving this to a separate bullet and reword to make clear why this approach is done:

"- Note: this approach of using a tentative scratch SRD and shifting the register numbers if used, avoids having to perform register allocation a second time if the tentative SRD is eliminated. This is more efficient and avoids the problem that the second register allocation may perform spilling which will fail as here is no longer a scratch SRD."

For consistency, should SRD be changed to V# to match the usage in the next section?


================
Comment at: llvm/docs/AMDGPUUsage.rst:8572
+      in :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions`.
 
+.. _amdgpu-amdhsa-function-call-convention-non-kernel-functions:
----------------
Should the manner that the kernel prolog sets the scratch V# be specified? The compiler requests that the scratch V# and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the queue base address in the scratch V#and moved to SGPR0-3.

Also specify how the kernel must set the FLAT_SCRATCH. The compiler requests that the flat scratch and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the flat scratch base and moved to FLAT_SCRATCH.

Should setup up of M0 also be defined here. For GFX6-??? it is set to the LDS size, otherwise it is set to ???.

Any other setup that hs to be done in the kernel prolog?


================
Comment at: llvm/docs/AMDGPUUsage.rst:8614
+
+    This may be used to obtain the private address of stack objects and to
+    convert this address to a flat address by adding the flat scratch aperture
----------------
"private address" -> "private address space address"


================
Comment at: llvm/docs/AMDGPUUsage.rst:8653
     * GFX6-8: M0
+    * SGPR0-3: V#
     * All SGPR and VGPR registers except the clobbered registers of SGPR4-31 and
----------------
Is this necessary to say since the following bullet states all SGPS except 4-31 which means SGPR0-3 aare preserved?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75138/new/

https://reviews.llvm.org/D75138





More information about the llvm-commits mailing list