[llvm] [AMDGPU][NFC] AMDGPUUsage.rst: document corefile format (PR #104419)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 15 01:45:35 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: None (lancesix)
<details>
<summary>Changes</summary>
This patch adds a description of the core file format used for AMDGPU.
Reference implementation for creating and loading AMDGPU core dump is available in [ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)
---
Full diff: https://github.com/llvm/llvm-project/pull/104419.diff
1 Files Affected:
- (modified) llvm/docs/AMDGPUUsage.rst (+113)
``````````diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 117fc2cf6bbbc..97fca32d4ece6 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2378,6 +2378,9 @@ are deprecated and should not be used.
======== ============================== ======================================
"AMDGPU" ``NT_AMDGPU_METADATA`` Metadata in Message Pack [MsgPack]_
binary format.
+ "AMDGPU" ``NT_AMDGPU_KFD_CORE_STATE`` Snapshot of runtime, agent and queues
+ state for use in core dump. See
+ :ref:`amdgpu_corefile_note`.
======== ============================== ======================================
..
@@ -2390,6 +2393,7 @@ are deprecated and should not be used.
============================== =====
*reserved* 0-31
``NT_AMDGPU_METADATA`` 32
+ ``NT_AMDGPU_KFD_CORE_STATE`` 33
============================== =====
``NT_AMDGPU_METADATA``
@@ -15024,6 +15028,115 @@ instructions are handled as follows:
trap handler installed.
=============== =============== ===========================================
+Core file format
+================
+
+This section describes the format of core files supporting AMDGPU. Core dumps
+for an AMDGPU program can come in 2 flavors: split or unified core files.
+
+The split layout consists of one host core file containing the information to
+rebuild the image of the host process and one AMDGPU core file that contains
+the information for the AMDGPU agents used in the process. The AMDGPU core
+file consists of:
+
+* A note describing the state of the AMDGPU agents, AMDGPU queues, and AMDGPU
+ runtime for the process (see :ref:`amdgpu_corefile_note`).
+* A list of load segments containing an image of the AMDGPU agents' memory (see
+ :ref:`amdgpu_corefile_memory`).
+
+The unified core file is the union of all the information contained in
+the two files of the split layout (all notes and load segments). It contains
+all the information required to reconstruct the image of the process across all
+the agents.
+
+Core file header
+----------------
+
+An AMDGPU core file is an ``ELF64`` core file. The content of the header
+differs in unified core file layout and AMDGPU core file layout.
+
+Split files
+~~~~~~~~~~~
+
+In the split files layout, the AMDGPU core file is an ``ELF64`` file with the
+header configured as described in :ref:`amdgpu-corefile-headers-table`:
+
+ .. table:: AMDGPU corefile headers
+ :name: amdgpu-corefile-headers-table
+
+ ========================== ===================================
+ Field Value
+ ========================== ===================================
+ ``e_ident[EI_CLASS]`` ``ELFCLASS64`` (``0x2``)
+ ``e_ident[EI_DATA]`` ``ELFDATA2LSB`` (``0x1``)
+ ``e_ident[EI_OSABI]`` ``ELFOSABI_AMDGPU_HSA`` (``0x40``)
+ ``e_type`` ``ET_CORE``(``0x4``)
+ ``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA_5``
+ ``e_machine`` ``EM_AMDGPU`` (``0xe0``)
+ ========================== ===================================
+
+Unified file
+~~~~~~~~~~~~
+
+In the unified core file mode, the ``ELF64`` headers are set to describe
+the host architecture and process.
+
+.. _amdgpu_corefile_note:
+
+Core file notes
+---------------
+
+An AMDGPU core file must contain one snapshot note in a ``PT_NOTE`` segment.
+When using a split core file layout, this note is in the AMDGPU file.
+
+The note record vendor field is "``AMDGPU``" and the record type is
+"``NT_AMDGPU_KFD_CORE_STATE``" (see :ref:`amdgpu-note-records-v3-onwards`)
+
+The content of the note is defined in table
+:ref:`amdgpu-core-snapshot-note-layout-table-v1`:
+
+ .. table:: AMDGPU snapshot note format V1
+ :name: amdgpu-core-snapshot-note-layout-table-v1
+
+ ================================ ======================================= ======================= ============== ===========================
+ Field Type Size (bytes) Byte alignment Comment
+ ================================ ======================================= ======================= ============== ===========================
+ ``version_major`` ``uint32`` 4 4 ``KFD_IOCTL_MAJOR_VERSION``
+ ``version_minor`` ``uint32`` 4 4 ``KFD_IOCTL_MINOR_VERSION``
+ ``runtime_info_size`` ``uint64`` 8 8 Must be a multiple of 8
+ ``n_agents`` ``uint32`` 4 8
+ ``agent_info_entry_size`` ``uint32`` 4 4 Must be a multiple of 8
+ ``n_queues`` ``uint32`` 4 8
+ ``queue_info_entry_size`` ``uint32`` 4 4 Must be a multiple of 8
+ ``runtime_info`` ``kfd_runtime_info`` ``runtime_info_size`` 8
+ ``agents_info`` ``kfd_dbg_device_info_entry[n_agents]`` ``n_agents * 8
+ agent_info_entry_size``
+ ``queues_info`` ``kfd_queue_snapshot_entry[n_queues]`` ``n_queues *
+ queue_info_entry_size`` 8
+ ================================ ======================================= ======================= ============== ===========================
+
+The definition of all the ``kfd_*`` types comes from the
+``include/uapi/linux/kfd_ioctl.h`` header file from the KFD repository. It is
+usually installed in ``/usr/include/linux/kfd_ioctl.h``. The version of the
+``kfd_ioctl.h`` file used must define values for
+``KFD_IOCTL_MAJOR_VERSION`` and ``KFD_IOCTL_MINOR_VERSION`` matching
+the values of ``kfd_version_major`` and ``kfd_version_major`` from the
+note.
+
+.. _amdgpu_corefile_memory:
+
+Memory segments
+---------------
+
+An AMDGPU core file must contain an image of the AMDGPU agents' memory in load
+segments (of type ``PT_LOAD``). Those segments must correspond to the memory
+regions where the content of the agent memory is mapped into the host process
+by the ROCr runtime (note that those memory mappings are usually not readable
+by the process itself).
+
+When using the split core file layout, those segments must be included in the
+AMDGPU core file.
+
Source Languages
================
``````````
</details>
https://github.com/llvm/llvm-project/pull/104419
More information about the llvm-commits
mailing list