[llvm] [AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (PR #104419)

via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 15 01:45:35 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-amdgpu

Author: None (lancesix)

<details>
<summary>Changes</summary>

This patch adds a description of the core file format used for AMDGPU.

Reference implementation for creating and loading AMDGPU core dump is available in [ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)

---
Full diff: https://github.com/llvm/llvm-project/pull/104419.diff


1 Files Affected:

- (modified) llvm/docs/AMDGPUUsage.rst (+113) 


``````````diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 117fc2cf6bbbc..97fca32d4ece6 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2378,6 +2378,9 @@ are deprecated and should not be used.
      ======== ============================== ======================================
      "AMDGPU" ``NT_AMDGPU_METADATA``         Metadata in Message Pack [MsgPack]_
                                              binary format.
+     "AMDGPU" ``NT_AMDGPU_KFD_CORE_STATE``   Snapshot of runtime, agent and queues
+                                             state for use in core dump.  See
+                                             :ref:`amdgpu_corefile_note`.
      ======== ============================== ======================================
 
 ..
@@ -2390,6 +2393,7 @@ are deprecated and should not be used.
      ============================== =====
      *reserved*                     0-31
      ``NT_AMDGPU_METADATA``         32
+     ``NT_AMDGPU_KFD_CORE_STATE``   33
      ============================== =====
 
 ``NT_AMDGPU_METADATA``
@@ -15024,6 +15028,115 @@ instructions are handled as follows:
                                      trap handler installed.
      =============== =============== ===========================================
 
+Core file format
+================
+
+This section describes the format of core files supporting AMDGPU. Core dumps
+for an AMDGPU program can come in 2 flavors: split or unified core files.
+
+The split layout consists of one host core file containing the information to
+rebuild the image of the host process and one AMDGPU core file that contains
+the information for the AMDGPU agents used in the process.  The AMDGPU core
+file consists of:
+
+* A note describing the state of the AMDGPU agents, AMDGPU queues, and AMDGPU
+  runtime for the process (see :ref:`amdgpu_corefile_note`).
+* A list of load segments containing an image of the AMDGPU agents' memory (see
+  :ref:`amdgpu_corefile_memory`).
+
+The unified core file is the union of all the information contained in
+the two files of the split layout (all notes and load segments).  It contains
+all the information required to reconstruct the image of the process across all
+the agents.
+
+Core file header
+----------------
+
+An AMDGPU core file is an ``ELF64`` core file.  The content of the header
+differs in unified core file layout and AMDGPU core file layout.
+
+Split files
+~~~~~~~~~~~
+
+In the split files layout, the AMDGPU core file is an ``ELF64`` file with the
+header configured as described in :ref:`amdgpu-corefile-headers-table`:
+
+  .. table:: AMDGPU corefile headers
+     :name: amdgpu-corefile-headers-table
+
+     ========================== ===================================
+     Field                      Value
+     ========================== ===================================
+     ``e_ident[EI_CLASS]``      ``ELFCLASS64`` (``0x2``)
+     ``e_ident[EI_DATA]``       ``ELFDATA2LSB`` (``0x1``)
+     ``e_ident[EI_OSABI]``      ``ELFOSABI_AMDGPU_HSA`` (``0x40``)
+     ``e_type``                 ``ET_CORE``(``0x4``)
+     ``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA_5``
+     ``e_machine``              ``EM_AMDGPU`` (``0xe0``)
+     ========================== ===================================
+
+Unified file
+~~~~~~~~~~~~
+
+In the unified core file mode, the ``ELF64`` headers are set to describe
+the host architecture and process.
+
+.. _amdgpu_corefile_note:
+
+Core file notes
+---------------
+
+An AMDGPU core file must contain one snapshot note in a ``PT_NOTE`` segment.
+When using a split core file layout, this note is in the AMDGPU file.
+
+The note record vendor field is "``AMDGPU``" and the record type is
+"``NT_AMDGPU_KFD_CORE_STATE``" (see :ref:`amdgpu-note-records-v3-onwards`)
+
+The content of the note is defined in table
+:ref:`amdgpu-core-snapshot-note-layout-table-v1`:
+
+  .. table:: AMDGPU snapshot note format V1
+     :name: amdgpu-core-snapshot-note-layout-table-v1
+
+     ================================ ======================================= ======================= ============== ===========================
+     Field                            Type                                    Size (bytes)            Byte alignment Comment
+     ================================ ======================================= ======================= ============== ===========================
+     ``version_major``                ``uint32``                              4                       4              ``KFD_IOCTL_MAJOR_VERSION``
+     ``version_minor``                ``uint32``                              4                       4              ``KFD_IOCTL_MINOR_VERSION``
+     ``runtime_info_size``            ``uint64``                              8                       8              Must be a multiple of 8
+     ``n_agents``                     ``uint32``                              4                       8
+     ``agent_info_entry_size``        ``uint32``                              4                       4              Must be a multiple of 8
+     ``n_queues``                     ``uint32``                              4                       8
+     ``queue_info_entry_size``        ``uint32``                              4                       4              Must be a multiple of 8
+     ``runtime_info``                 ``kfd_runtime_info``                    ``runtime_info_size``   8
+     ``agents_info``                  ``kfd_dbg_device_info_entry[n_agents]`` ``n_agents *            8
+                                                                              agent_info_entry_size``
+     ``queues_info``                  ``kfd_queue_snapshot_entry[n_queues]``  ``n_queues *
+                                                                              queue_info_entry_size`` 8
+     ================================ ======================================= ======================= ============== ===========================
+
+The definition of all the ``kfd_*`` types comes from the
+``include/uapi/linux/kfd_ioctl.h`` header file from the KFD repository. It is
+usually installed in ``/usr/include/linux/kfd_ioctl.h``. The version of the
+``kfd_ioctl.h`` file used must define values for
+``KFD_IOCTL_MAJOR_VERSION`` and ``KFD_IOCTL_MINOR_VERSION`` matching
+the values of ``kfd_version_major`` and ``kfd_version_major`` from the
+note.
+
+.. _amdgpu_corefile_memory:
+
+Memory segments
+---------------
+
+An AMDGPU core file must contain an image of the AMDGPU agents' memory in load
+segments (of type ``PT_LOAD``).  Those segments must correspond to the memory
+regions where the content of the agent memory is mapped into the host process
+by the ROCr runtime (note that those memory mappings are usually not readable
+by the process itself).
+
+When using the split core file layout, those segments must be included in the
+AMDGPU core file.
+
 Source Languages
 ================
 

``````````

</details>


https://github.com/llvm/llvm-project/pull/104419


More information about the llvm-commits mailing list