[llvm] f5678d4 - [AMDGPU] Update AMDGPUUsage with DWARF proposal
via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 19 12:36:02 PST 2020
Author: Tony
Date: 2020-02-19T15:30:53-05:00
New Revision: f5678d4a6a602bd966570b6f9fdd9aa0de5855b8
URL: https://github.com/llvm/llvm-project/commit/f5678d4a6a602bd966570b6f9fdd9aa0de5855b8
DIFF: https://github.com/llvm/llvm-project/commit/f5678d4a6a602bd966570b6f9fdd9aa0de5855b8.diff
LOG: [AMDGPU] Update AMDGPUUsage with DWARF proposal
Summary:
- Add AMDGPU DWARF proposal.
- Add references for gfx10 ISA and SemVer.
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, aprantl, dstuttard, tpr, jfb, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70523
Added:
Modified:
llvm/docs/AMDGPUUsage.rst
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 84761dc567d4..863a907f6731 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1078,58 +1078,2986 @@ There is no current OS loader support for 32-bit programs and so
DWARF
-----
-Standard DWARF [DWARF]_ Version 5 sections can be generated. These contain
-information that maps the code object executable code and data to the source
-language constructs. It can be used by tools such as debuggers and profilers.
+.. warning::
+ This section describes a **provisional proposal** that is not currently
+ fully implemented and is subject to change.
+
+Standard DWARF [DWARF]_ sections can be generated. These contain information
+that maps the code object executable code and data to the source language
+constructs. It can be used by tools such as debuggers and profilers.
+
+This section defines the AMDGPU target specific DWARF. It applies to DWARF
+Version 4 and 5.
+
+.. _amdgpu-dwarf-overview:
+
+Overview
+~~~~~~~~
+
+The AMDGPU has several features that require additional DWARF functionality in
+order to support optimized code.
+
+A single code object can contain code for kernels that have
diff erent wave
+sizes. The vector registers and some scalar registers are based on the wave
+size. AMDGPU defines distinct DWARF registers for each wave size. This
+simplifies the consumer of the DWARF so that each register has a fixed size,
+rather than being dynamic according to the wave mode. Similarly, distinct DWARF
+registers are defined for those registers that vary in size according to the
+process address size. This allows a consumer to treat a specific AMDGPU target
+as a single architecture regardless of how it is configured. The compiler
+explicitly specifies the registers that match the mode of the code it is
+generating.
+
+AMDGPU optimized code may spill vector registers to non-global address space
+memory, and this spilling may be done only for lanes that are active on entry to
+the subprogram. To support this, a location description that can be created as a
+masked select is required.
+
+Since the active lane mask may be held in a register, a way to get the value of
+a register on entry to a subprogram is required. To support this an operation
+that returns the caller value of a register as specified by the Call Frame
+Information (see :ref:`amdgpu-call-frame-information`) is required.
+
+Current DWARF uses an empty expression to indicate an undefined location
+description. Since the masked select composite location description operation
+takes more than one location description, it is necessary to have an explicit
+way to specify an undefined location description. Otherwise it is not possible
+to specify that a particular one of the input location descriptions is
+undefined.
+
+CFI describes restoring callee saved registers that are spilled. Currently CFI
+only allows a location description that is a register, memory address, or
+implicit location description. AMDGPU optimized code may spill scalar registers
+into portions of vector registers. This requires extending CFI to allow any
+location description.
+
+The vector registers of the AMDGPU are represented as their full wave size,
+meaning the wave size times the dword size. This reflects the actual hardware,
+and allows the compiler to generate DWARF for languages that map a thread to the
+complete wave. It also allows more efficient DWARF to be generated to describe
+the CFI as only a single expression is required for the whole vector register,
+rather than a separate expression for each lane's dword of the vector register.
+It also allows the compiler to produce DWARF that indexes the vector register if
+it spills scalar registers into portions of a vector registers.
+
+Since DWARF stack value entries have a base type and AMDGPU registers are a
+vector of dwords, the ability to specify that a base type is a vector is
+required.
+
+If the source language is mapped onto the AMDGPU wavefronts in a SIMT manner,
+then the variable DWARF location expressions must compute the location for a
+single lane of the wavefront. Therefore, a DWARF operator is required to denote
+the current lane, much like ``DW_OP_push_object_address`` denotes the current
+object. The ``DW_OP_*piece`` operators only allow literal indices. Therefore, a
+composite location description is required that can take a computed index of a
+location description (such as a vector register).
+
+If the source language is mapped onto the AMDGPU wavefronts in a SIMT manner the
+compiler can use the AMDGPU execution mask register to control which lanes are
+active. To describe the conceptual location of non-active lanes a DWARF
+expression is needed that can compute a per lane PC. For efficiency, this is
+done for the wave as a whole. This expression benefits by having a masked select
+composite location description operation. This requires an attribute for source
+location of each lane. The AMDGPU may update the execution mask for whole wave
+operations and so needs an attribute that computes the current active lane mask.
+
+AMDGPU needs to be able to describe addresses that are in
diff erent kinds of
+memory. Optimized code may need to describe a variable that resides in pieces
+that are in
diff erent kinds of storage which may include parts of registers,
+memory that is in a mixture of memory kinds, implicit values, or be undefined.
+DWARF has the concept of segment addresses. However, the segment cannot be
+specified within a DWARF expression, which is only able to specify the offset
+portion of a segment address. The segment index is only provided by the entity
+that species the DWARF expression. Therefore, the segment index is a property
+that can only be put on complete objects, such as a variable. That makes it only
+suitable for describing an entity (such as variable or subprogram code) that is
+in a single kind of memory. Therefore, AMDGPU uses the DWARF concept of address
+spaces. For example, a variable may be allocated in a register that is partially
+spilled to the call stack which is in the private address space, and partially
+spilled to the local address space.
+
+DWARF uses the concept of an address in many expression operators but does not
+define how it relates to address spaces. For example,
+``DW_OP_push_object_address`` pushes the address of an object. Other contexts
+implicitly push an address on the stack before evaluating an expression. For
+example, the ``DW_AT_use_location`` attribute of the
+``DW_TAG_ptr_to_member_type``. The expression that uses the address needs to do
+so in a general way and not need to be dependent on the address space of the
+address. For example, a pointer to member value may want to be applied to an
+object that may reside in any address space.
+
+The number of registers and the cost of memory operations is much higher for
+AMDGPU than a typical CPU. The compiler attempts to optimize whole variables and
+arrays into registers. Currently DWARF only allows ``DW_OP_push_object_address``
+and related operations to work with a global memory location. To support AMDGPU
+optimized code it is required to generalize DWARF to allow any location
+description to be used. This allows registers, or composite location
+descriptions that may be a mixture of memory, registers, or even implicit
+values.
+
+Allowing a location description to be an entry on the DWARF stack allows them to
+compose naturally. It allows objects to be located in any kind of memory address
+space, in registers, be implicit values, be undefined, or a composite of any of
+these.
+
+By extending DWARF carefully, all existing DWARF expressions can retain their
+current semantic meaning. DWARF has implicit conversions that convert from a
+value that is treated as an address in the default address space to a memory
+location description. This can be extended to allow a default address space
+memory location description to be implicitly converted back to its address
+value. To allow composition of composite location descriptions, an explicit
+operator that indicates the end is required. This can be implied if the end of a
+DWARF expression is reached, allowing current DWARF expressions to remain legal.
+
+The ``DW_OP_plus`` and ``DW_OP_minus`` can be defined to operate on a memory
+location description in the default target architecture address space and a
+generic type, and produce a memory location description. This allows them to
+continue to be used to offset an address. To generalize offsetting to any
+location description, including location descriptions that describe when bytes
+are in registers, are implicit, or a composite of these, the
+``DW_OP_LLVM_offset`` and ``DW_OP_LLVM_bit_offset`` operations are added. These
+do not perform wrapping which would be hard to define for location descriptions
+of non-memory kinds. This allows ``DW_OP_push_object_address`` to push a
+location description that may be in a register, or be an implicit value, and the
+DWARF expression of ``DW_TAG_ptr_to_member_type`` can contain
+``DW_OP_LLVM_offset`` to offset within it. ``DW_OP_LLVM_bit_offset`` generalizes
+DWARF to work with bit fields.
+
+The DWARF ``DW_OP_xderef*`` operation allows a value to be converted into an
+address of a specified address space which is then read. But provides no way to
+create a memory location description for an address in the non-default address
+space. For example, AMDGPU variables can be allocated in the local address space
+at a fixed address. It is required to have an operation to create an address in
+a specific address space that can be used to define the location description of
+the variable. Defining this operation to produce a location description allows
+the size of addresses in an address space to be larger than the generic type.
+
+If an operation had to produce a value that can be implicitly converted to a
+memory location description, then it would be limited to the size of the generic
+type which matches the size of the default address space. Its value would be
+unspecified and likely not match any value in the actual program. By making the
+result a location description, it allows a consumer great freedom in how it
+implements it. The implicit conversion back to a value can be limited only to
+the default address space to maintain compatibility.
+
+Similarly ``DW_OP_breg*`` treats the register as containing an address in the
+default address space. It is required to be able to specify the address space of
+the register value.
+
+Almost all uses of addresses in DWARF are limited to defining location
+descriptions, or to be dereferenced to read memory. The exception is
+``DW_CFA_val_offset`` which uses the address to set the value of a register. By
+defining the CFA DWARF expression as being a memory location description, it can
+maintain what address space it is, and that can be used to convert the offset
+address back to an address in that address space. (An alternative is to defined
+``DW_CFA_val_offset`` to implicitly use the default address space, and add
+another operation that specifies the address space.)
+
+This approach allows all existing DWARF to have the identical semantics. It
+allows the compiler to explicitly specify the address space it is using. For
+example, a compiler could choose to access private memory in a swizzled manner
+when mapping a source language to a wave in a SIMT manner, or to access it in an
+unswizzled manner if mapping the same language with the wave being the thread.
+It also allows the compiler to mix the address space it uses to access private
+memory. For example, for SIMT it can still spill entire vector registers in an
+unswizzled manner, while using swizzled for SIMT variable access. This approach
+allows memory location descriptions for
diff erent address spaces to be combined
+using the regular ``DW_OP_*piece`` operators.
+
+Location descriptions are an abstraction of storage, they give freedom to the
+consumer on how to implement them. They allow the address space to encode lane
+information so they can be used to read memory with only the memory description
+and no extra arguments. The same set of operations can operate on locations
+independent of their kind of storage. The ``DW_OP_deref*`` therefore can be used
+on any storage kind. ``DW_OP_xderef*`` is unnecessary except to become a more
+compact way to convert a segment address followed by dereferencing it.
+
+Several approaches were considered, and the one proposed appears to be the
+cleanest and offers the greatest improvement of DWARF's ability to support
+optimized code. Examining the gdb debugger and LLVM compiler, it appears only to
+require modest changes as they both already have to support general use of
+location descriptions. It is anticipated that will be the case for other
+debuggers and compilers.
+
+The following provides the definitions for the additional operators, as well as
+clarifying how existing expression operators, CFI operators, and attributes
+behave with respect to generalized location descriptions that support address
+spaces. It has been defined such that it is backwards compatible with DWARF 5.
+The definitions are intended to fully define well-formed DWARF in a consistent
+style. Some sections are organized to mirror the DWARF 5 specification
+structure, with non-normative text shown in *italics*.
+
+.. _amdgpu-dwarf-language-names:
+
+Language Names
+~~~~~~~~~~~~~~
+
+Language codes defined for use with the ``DW_AT_language`` attribute are
+defined in :ref:`amdgpu-dwarf-language-names-table`.
+
+.. table:: AMDGPU DWARF Language Names
+ :name: amdgpu-dwarf-language-names-table
+
+ ==================== ====== =================== =============================
+ Language Name Code Default Lower Bound Description
+ ==================== ====== =================== =============================
+ ``DW_LANG_LLVM_HIP`` 0x8100 0 AMD HIP Language. See [HIP]_.
+ ==================== ====== =================== =============================
+
+The ``DW_LANG_LLVM_HIP`` language can be supported by extending the C++
+language.
+
+.. _amdgpu-dwarf-register-mapping:
+
+Register Mapping
+~~~~~~~~~~~~~~~~
+
+DWARF registers are encoded as numbers, which are mapped to architecture
+registers. The mapping for AMDGPU is defined in
+:ref:`amdgpu-dwarf-register-mapping-table`.
+
+.. table:: AMDGPU DWARF Register Mapping
+ :name: amdgpu-dwarf-register-mapping-table
+
+ ============== ================= ======== ==================================
+ DWARF Register AMDGPU Register Bit Size Description
+ ============== ================= ======== ==================================
+ 0 PC_32 32 Program Counter (PC) when
+ executing in a 32-bit process
+ address space. Used in the CFI to
+ describe the PC of the calling
+ frame.
+ 1 EXEC_MASK_32 32 Execution Mask Register when
+ executing in wave 32 mode.
+ 2-15 *Reserved*
+ 16 PC_64 64 Program Counter (PC) when
+ executing in a 64-bit process
+ address space. Used in the CFI to
+ describe the PC of the calling
+ frame.
+ 17 EXEC_MASK_64 64 Execution Mask Register when
+ executing in wave 64 mode.
+ 18-31 *Reserved*
+ 32-95 SGPR0-SGPR63 32 Scalar General Purpose
+ Registers.
+ 96-127 *Reserved*
+ 128-511 *Reserved*
+ 512-1023 *Reserved*
+ 1024-1087 *Reserved*
+ 1088-1129 SGPR64-SGPR105 32 Scalar General Purpose Registers
+ 1130-1535 *Reserved*
+ 1536-1791 VGPR0-VGPR255 32*32 Vector General Purpose Registers
+ when executing in wave 32 mode.
+ 1792-2047 *Reserved*
+ 2048-2303 AGPR0-AGPR255 32*32 Vector Accumulation Registers
+ when executing in wave 32 mode.
+ 2304-2559 *Reserved*
+ 2560-2815 VGPR0-VGPR255 64*32 Vector General Purpose Registers
+ when executing in wave 64 mode.
+ 2816-3071 *Reserved*
+ 3072-3327 AGPR0-AGPR255 64*32 Vector Accumulation Registers
+ when executing in wave 64 mode.
+ 3328-3583 *Reserved*
+ ============== ================= ======== ==================================
+
+The vector registers are represented as the full size for the wavefront. They
+are organized as consecutive dwords (32-bits), one per lane, with the dword at
+the least significant bit position corresponding to lane 0 and so forth. DWARF
+location expressions involving the ``DW_OP_LLVM_offset`` and
+``DW_OP_LLVM_push_lane`` operations are used to select the part of the vector
+register corresponding to the lane that is executing the current thread of
+execution in languages that are implemented using a SIMD or SIMT execution
+model.
+
+If the wavefront size is 32 lanes then the wave 32 mode register definitions
+are used. If the wavefront size is 64 lanes then the wave 64 mode register
+definitions are used. Some AMDGPU targets support executing in both wave 32
+and wave 64 mode. The register definitions corresponding to the wave mode
+of the generated code will be used.
+
+If code is generated to execute in a 32-bit process address space then the
+32-bit process address space register definitions are used. If code is
+generated to execute in a 64-bit process address space then the 64-bit process
+address space register definitions are used. The ``amdgcn`` target only
+supports the 64-bit process address space.
+
+Address Class Mapping
+~~~~~~~~~~~~~~~~~~~~~
+
+DWARF address classes are used for languages with the concept of memory address
+spaces. They are used in the ``DW_AT_address_class`` attribute for pointer type,
+reference type, subroutine, and subroutine type debugger information entries
+(DIEs).
+
+The address class mapping for AMDGPU is defined in
+:ref:`amdgpu-dwarf-address-class-mapping-table`.
+
+.. table:: AMDGPU DWARF Address Class Mapping
+ :name: amdgpu-dwarf-address-class-mapping-table
+
+ =========================== ===== =================
+ DWARF AMDGPU
+ --------------------------------- -----------------
+ Address Class Name Value Address Space
+ =========================== ===== =================
+ ``DW_ADDR_none`` 0x00 Generic (Flat)
+ ``DW_ADDR_AMDGPU_global`` 0x01 Global
+ ``DW_ADDR_AMDGPU_region`` 0x02 Region (GDS)
+ ``DW_ADDR_AMDGPU_local`` 0x03 Local (group/LDS)
+ ``DW_ADDR_AMDGPU_constant`` 0x04 Global
+ ``DW_ADDR_AMDGPU_private`` 0x05 Private (Scratch)
+ =========================== ===== =================
+
+See :ref:`amdgpu-address-spaces` for information on the AMDGPU address spaces
+including address size and NULL value.
+
+For AMDGPU the address class encodes the address class as declared in the
+source language type.
+
+For AMDGPU if no ``DW_AT_address_class`` attribute is present, then the
+``DW_ADDR_none`` address class is used.
+
+.. note::
+
+ The ``DW_ADDR_none`` default was defined as ``Generic`` and not ``Global``
+ to match the LLVM address space ordering. This ordering was chosen to better
+ support CUDA-like languages such as HIP that do not have address spaces in
+ the language type system, but do allow variables to be allocated in
+
diff erent address spaces. So effectively all CUDA and HIP source language
+ addresses are generic.
+
+.. note::
+
+ Currently DWARF defines address class values as architecture specific. It
+ is unclear how language specific address spaces are intended to be
+ represented in DWARF.
+
+ For example, OpenCL defines address spaces for ``global``, ``local``,
+ ``constant``, and ``private``. These are part of the type system and are
+ modifies to pointer types. In addition, OpenCL defines ``generic`` pointers
+ that can reference either the ``global``, ``local``, or ``private`` address
+ spaces. To support the OpenCL language the debugger would want to support
+ casting pointers between the ``generic`` and other address spaces, and
+ possibly using pointer casting to form an address for a specific address
+ space out of an integral value.
+
+ The method to use to dereference a pointer type or reference type value is
+ defined in DWARF expressions using ``DW_OP_xderef*`` which uses an
+ architecture specific address space.
+
+ DWARF defines the ``DW_AT_address_class`` attribute on pointer types and
+ reference types. It specifies the method to use to dereference them. Why
+ is the value of this not the same as the address space value used in
+ ``DW_OP_xderef*`` since in both cases it is architecture specific and the
+ architecture presumably will use the same set of methods to dereference
+ pointers in both cases?
+
+ Since ``DW_AT_address_class`` uses an architecture specific value it cannot
+ in general capture the source language address space type modifier concept.
+ On some architectures all source language address space modifies may
+ actually use the same method for dereferencing pointers.
+
+ One possibility is for DWARF to add an ``DW_TAG_LLVM_address_class_type``
+ type modifier that can be applied to a pointer type and reference type. The
+ ``DW_AT_address_class`` attribute could be re-defined to not be architecture
+ specific and instead define generalized language values that will support
+ OpenCL and other languages using address spaces. The ``DW_AT_address_class``
+ could be defined to not be applied to pointer or reference types, but
+ instead only to the ``DW_TAG_LLVM_address_class_type`` type modifier entry.
+
+ If a pointer type or reference type is not modified by
+ ``DW_TAG_LLVM_address_class_type`` or if ``DW_TAG_LLVM_address_class_type``
+ has no ``DW_AT_address_class`` attribute, then the pointer type or reference
+ type would be defined to use the ``DW_ADDR_none`` address class as
+ currently. Since modifiers can be chained, it would need to be defined if
+ multiple ``DW_TAG_LLVM_address_class_type`` modifies was legal, and if so if
+ the outermost one is the one that takes precedence.
+
+ A target implementation that supports multiple address spaces would need to
+ map ``DW_ADDR_none`` appropriately to support CUDA-like languages
+ that have no address classes in the type system, but do support variable
+ allocation in address spaces. See the above note that describes why AMDGPU
+ choose to make ``DW_ADDR_none`` map to the ``Generic`` AMDGPU address space
+ and not the ``Global`` address space.
+
+ An alternative would be to define ``DW_ADDR_none`` as being the global
+ address class and then change ``DW_ADDR_global`` to ``DW_ADDR_generic``.
+ Compilers generating DWARF for CUDA-like languages would then have to define
+ every CUDA-like language pointer type or reference type using
+ ``DW_TAG_LLVM_address_class_type`` with a ``DW_AT_address_class`` attribute
+ of ``DW_ADDR_generic`` to match the language semantics. The AMDGPU
+ alternative avoids needing to do this and seems to fit better into how CLANG
+ and LLVM have added support for the CUDA-like languages on top of existing
+ C++ language support.
+
+ A new ``DW_AT_address_space`` attribute could be defined that can be applied
+ to pointer type, reference type, subroutine, and subroutine type to describe
+ how objects having the given type are dereferenced or called (the role that
+ ``DW_AT_address_class`` currently provides). The values of
+ ``DW_AT_address_space`` would be architecture specific and the same as used
+ in ``DW_OP_xderef*``.
+
+.. _amdgpu-dwarf-address-space-mapping:
Address Space Mapping
~~~~~~~~~~~~~~~~~~~~~
-The following address space mapping is used:
+DWARF address spaces are used in location expressions to describe the memory
+space where data resides. Address spaces correspond to a target specific memory
+space and are not tied to any source language concept.
+
+The AMDGPU address space mapping is defined in
+:ref:`amdgpu-dwarf-address-space-mapping-table`.
+
+.. table:: AMDGPU DWARF Address Space Mapping
+ :name: amdgpu-dwarf-address-space-mapping-table
+
+ ======================================= ===== ======= ======== ================= =======================
+ DWARF AMDGPU Notes
+ --------------------------------------- ----- ---------------- ----------------- -----------------------
+ Address Space Name Value Address Bit Size Address Space
+ --------------------------------------- ----- ------- -------- ----------------- -----------------------
+ .. 64-bit 32-bit
+ process process
+ address address
+ space space
+ ======================================= ===== ======= ======== ================= =======================
+ ``DW_ASPACE_none`` 0x00 8 4 Global *default address space*
+ ``DW_ASPACE_AMDGPU_generic`` 0x01 8 4 Generic (Flat)
+ ``DW_ASPACE_AMDGPU_region`` 0x02 4 4 Region (GDS)
+ ``DW_ASPACE_AMDGPU_local`` 0x03 4 4 Local (group/LDS)
+ *Reserved* 0x04
+ ``DW_ASPACE_AMDGPU_private_lane`` 0x05 4 4 Private (Scratch) *focused lane*
+ ``DW_ASPACE_AMDGPU_private_wave`` 0x06 4 4 Private (Scratch) *unswizzled wave*
+ *Reserved* 0x07-
+ 0x1F
+ ``DW_ASPACE_AMDGPU_private_lane<0-63>`` 0x20- 4 4 Private (Scratch) *specific lane*
+ 0x5F
+ ======================================= ===== ======= ======== ================= =======================
+
+See :ref:`amdgpu-address-spaces` for information on the AMDGPU address spaces
+including address size and NULL value.
+
+The ``DW_ASPACE_none`` address space is the default address space used in DWARF
+operations that do not specify an address space. It therefore has to map to the
+global address space so that the ``DW_OP_addr*`` and related operations can
+refer to addresses in the program code.
+
+The ``DW_ASPACE_AMDGPU_generic`` address space allows location expressions to
+specify the flat address space. If the address corresponds to an address in the
+local address space then it corresponds to the wave that is executing the
+focused thread of execution. If the address corresponds to an address in the
+private address space then it corresponds to the lane that is executing the
+focused thread of execution for languages that are implemented using a SIMD or
+SIMT execution model.
+
+.. note::
+
+ CUDA-like languages such as HIP that do not have address spaces in the
+ language type system, but do allow variables to be allocated in
diff erent
+ address spaces, will need to explicitly specify the
+ ``DW_ASPACE_AMDGPU_generic`` address space in the DWARF operations as the
+ default address space is the global address space.
+
+The ``DW_ASPACE_AMDGPU_local`` address space allows location expressions to
+specify the local address space corresponding to the wave that is executing the
+focused thread of execution.
+
+The ``DW_ASPACE_AMDGPU_private_lane`` address space allows location expressions
+to specify the private address space corresponding to the lane that is
+executing the focused thread of execution for languages that are implemented
+using a SIMD or SIMT execution model.
+
+The ``DW_ASPACE_AMDGPU_private_wave`` address space allows location expressions
+to specify the unswizzled private address space corresponding to the wave that
+is executing the focused thread of execution. The wave view of private memory
+is the per wave unswizzled backing memory layout defined in
+:ref:`amdgpu-address-spaces`, such that address 0 corresponds to the first
+location for the backing memory of the wave (namely the address is not offset
+by ``wavefront-scratch-base``). So to convert from a
+``DW_ASPACE_AMDGPU_private_lane`` to a ``DW_ASPACE_AMDGPU_private_wave``
+segment address perform the following:
+
+::
+
+ private-address-wave =
+ ((private-address-lane / 4) * wavefront-size * 4) +
+ (wavefront-lane-id * 4) + (private-address-lane % 4)
+
+If the ``DW_ASPACE_AMDGPU_private_lane`` segment address is dword aligned and
+the start of the dwords for each lane starting with lane 0 is required, then
+this simplifies to:
+
+::
+
+ private-address-wave =
+ private-address-lane * wavefront-size
+
+A compiler can use this address space to read a complete spilled vector
+register back into a complete vector register in the CFI. The frame pointer can
+be a private lane segment address which is dword aligned, which can be shifted
+to multiply by the wave size, and then used to form a private wave segment
+address that gives a location for a contiguous set of dwords, one per lane,
+where the vector register dwords are spilled. The compiler knows the wave size
+since it generates the code. Note that the type of the address may have to be
+converted as the size of a private lane segment address may be smaller than the
+size of a private wave segment address.
+
+The ``DW_ASPACE_AMDGPU_private_lane<n>`` address space allows location
+expressions to specify the private address space corresponding to a specific
+lane. For example, this can be used when the compiler spills scalar registers
+to scratch memory, with each scalar register being saved to a
diff erent lane's
+scratch memory.
+
+.. _amdgpu-dwarf-expressions:
+
+Expressions
+~~~~~~~~~~~
- .. table:: AMDGPU DWARF Address Space Mapping
- :name: amdgpu-dwarf-address-space-mapping-table
+The following sections define the new DWARF expression operator used by AMDGPU,
+as well as clarifying the extensions to already existing DWARF 5 operations.
- =================== =================
- DWARF Address Space Memory Space
- =================== =================
- 1 Private (Scratch)
- 2 Local (group/LDS)
- *omitted* Global
- *omitted* Constant
- *omitted* Generic (Flat)
- *not supported* Region (GDS)
- =================== =================
+DWARF expressions describe how to compute a value or specify a location
+description. An expression is encoded as a stream of operations, each consisting
+of an opcode followed by zero or more literal operands. The number of operands
+is implied by the opcode.
-See :ref:`amdgpu-address-spaces` for information on the address space
-terminology used in the table.
+Operations represent a postfix operation on a simple stack machine. They can act
+on entries on the stack, including adding entries and removing entries. If the
+kind of a stack entry does not match the kind required by the operation, and is
+not implicitly convertible to the required kind, then the DWARF expression is
+ill-formed.
-An ``address_class`` attribute is generated on pointer type DIEs to specify the
-DWARF address space of the value of the pointer when it is in the *private* or
-*local* address space. Otherwise the attribute is omitted.
+Each stack entry can be one of two kinds: a value or a location description.
+Value stack entries are described in :ref:`amdgpu-value-operations` and
+location description stack entries are described in
+:ref:`amdgpu-location-description-operations`.
-An ``DW_OP_xderef`` operation is generated in location list expressions for
-variables that are allocated in the *private* and *local* address space.
-Otherwise, ``DW_OP_xderef`` is omitted.
+*The evaluation of a DWARF expression can provide the location description of an
+object, the value of an array bound, the length of a dynamic string, the desired
+value itself, and so on.*
-Register Mapping
-~~~~~~~~~~~~~~~~
+The result of the evaluation of a DWARF expression is defined as:
-*This section is WIP.*
+* If evaluation of the DWARF expression is on behalf of a ``DW_OP_call*``
+ operation for a ``DW_AT_location`` attribute that belongs to a
+ ``DW_TAG_dwarf_procedure`` debugging information entry, then all the entries
+ on the stack are left, and execution of the DWARF expression containing the
+ ``DW_OP_call*`` operation continues.
-.. TODO::
- Define DWARF register enumeration.
-
- If want to present a wavefront state then should expose vector registers as
- 64 dword wide (rather than per work-item view that LLVM uses). Either as
- separate registers, or a 64x4 byte single register. In either case use a new
- ``DW_OP_lane`` op (akin to ``DW_OP_xderef``) to select the current lane usage
- in a location expression. This would also allow scalar register spilling to
- vector register lanes to be expressed (currently no debug information is
- being generated for spilling). If choose a wide single register approach then
- use ``DW_OP_lane`` in conjunction with ``DW_OP_piece`` operation to select
- the dword part of the register for the current lane. If the separate register
- approach then use ``DW_OP_lane`` to select the register.
+* If evaluation of the DWARF expression requires a location description, then:
+
+ * If the stack is empty, an undefined location description is returned.
+
+ * If the top stack entry is a location description, or can be converted to
+ one, then the, possibly converted, location description is returned. Any
+ other entries on the stack are discarded.
+
+ * Otherwise the DWARF expression is ill-formed.
+
+ .. note::
+
+ Could define this case as returning an implicit location description as
+ if the ``DW_OP_implicit`` operation is performed.
+
+* If evaluation of the DWARF expression requires a value, then:
+
+ * If the top stack entry is a value, or can be converted to one, then the,
+ possibly converted, value is returned. Any other entries on the stack are
+ discarded.
+
+ * Otherwise the DWARF expression is ill-formed.
+
+.. _amdgpu-stack-operations:
+
+Stack Operations
+++++++++++++++++
+
+The following operations manipulate the DWARF stack. Operations that index
+the stack assume that the top of the stack (most recently added entry) has index
+0. They allow the stack entries to be either a value or location description.
+
+If any stack entry accessed by a stack operation is an incomplete composite
+location description, then the DWARF expression is ill-formed.
+
+.. note::
+
+ These operations now support stack entries that are values and location
+ descriptions.
+
+.. note::
+
+ If it is desired to also make them work with incomplete composite location
+ descriptions then would need to define that the composite location storage
+ specified by the incomplete composite location description is also replicated
+ when a copy is pushed. This ensures that each copy of the incomplete composite
+ location description can updated the composite location storage they specify
+ independently.
+
+1. ``DW_OP_dup``
+
+ ``DW_OP_dup`` duplicates the stack entry at the top of the stack.
+
+2. ``DW_OP_drop``
+
+ ``DW_OP_drop`` pops the stack entry at the top of the stack and discards it.
+
+3. ``DW_OP_pick``
+
+ ``DW_OP_pick`` has a single unsigned 1-byte operand that is treated as an
+ index I. A copy of the stack entry with index I is pushed onto the stack.
+
+4. ``DW_OP_over``
+
+ ``DW_OP_over`` pushes a copy of the entry entry with index 1.
+
+ *This is equivalent to a ``DW_OP_pick 1`` operation.*
+
+5. ``DW_OP_swap``
+
+ ``DW_OP_swap`` swaps the top two stack entries. The entry at the top of the
+ stack becomes the second stack entry, and the second stack entry becomes the
+ top of the stack.
+
+6. ``DW_OP_rot``
+
+ ``DW_OP_rot`` rotates the first three stack entries. The entry at the top of
+ the stack becomes the third stack entry, the second entry becomes the top of
+ the stack, and the third entry becomes the second entry.
+
+.. _amdgpu-value-operations:
+
+Value Operations
+++++++++++++++++
+
+Each value stack entry has a type and a value, and can represent a value of
+any supported base type of the target machine. The base type specifies the size
+and encoding of the value.
+
+.. note::
+
+ It may be better to add an implicit pointer value kind that is produced when
+ ``DW_OP_deref*`` retrieves the full contents of an implicit pointer location
+ storage created by the ``DW_OP_implicit_pointer`` or
+ ``DW_OP_LLVM_aspace_implicit_pointer`` operations.
+
+Instead of a base type, value stack entries can have a distinguished generic
+type, which is an integral type that has the size of an address in the target
+architecture default address space on the target machine and unspecified
+signedness.
+
+*The generic type is the same as the unspecified type used for stack operations
+defined in DWARF Version 4 and before.*
+
+An integral type is a base type that has an encoding of ``DW_ATE_signed``,
+``DW_ATE_signed_char``, ``DW_ATE_unsigned``, ``DW_ATE_unsigned_char``,
+``DW_ATE_boolean``, or any target architecture defined integral encoding in the
+inclusive range ``DW_ATE_lo_user`` to ``DW_ATE_hi_user``.
+
+.. note::
+
+ Unclear if ``DW_ATE_address`` is an integral type. gdb does not seem to
+ consider as integral.
+
+1. ``DW_OP_LLVM_push_lane`` *New*
+
+ ``DW_OP_LLVM_push_lane`` pushes a value with the generic type that is the
+ target architecture lane identifier of the thread of execution for which a
+ user presented expression is currently being evaluated. For languages that
+ are implemented using a SIMD or SIMT execution model this is the lane number
+ that corresponds to the source language thread of execution upon which the
+ user is focused. Otherwise this is the value 0.
+
+ For AMDGPU, the lane identifier returned by ``DW_OP_LLVM_push_lane``
+ corresponds to the the hardware lane number which is numbered from 0 to the
+ wavefront size minus 1.
+
+2. ``DW_OP_entry_value``
+
+ ``DW_OP_entry_value`` pushes the value that the described location held upon
+ entering the current subprogram.
+
+ It has two operands. The first is an unsigned LEB128 integer. The second is
+ a block of bytes, with a length equal to the first operand, treated as a
+ DWARF expression E.
+
+ E is evaluated as if it had been evaluated upon entering the current
+ subprogram. E assumes no values are present on the DWARF stack initially and
+ results in exactly one value being pushed on the DWARF stack when completed.
+
+ ``DW_OP_push_object_address`` is not meaningful inside of this DWARF
+ operation.
+
+ If the result of E is a register location description (see
+ :ref:`amdgpu-register-location-descriptions`), ``DW_OP_entry_value`` pushes
+ the value that register had upon entering the current subprogram. The value
+ entry type is the target machine register base type. If the register value
+ is undefined or the register location description bit offset is not 0, then
+ the DWARF expression is ill-formed.
+
+ *The register location description provides a more compact form for the case
+ where the value was in a register on entry to the subprogram.*
+
+ Otherwise, the expression result is required to be a value, and
+ ``DW_OP_entry_value`` pushes that value.
+
+ *The values needed to evaluate* ``DW_OP_entry_value`` *could be obtained in
+ several ways. The consumer could suspend execution on entry to the
+ subprogram, record values needed by* ``DW_OP_entry_value`` *expressions
+ within the subprogram, and then continue; when evaluating*
+ ``DW_OP_entry_value``\ *, the consumer would use these recorded values
+ rather than the current values. Or, when evaluating* ``DW_OP_entry_value``\
+ *, the consumer could virtually unwind using the Call Frame Information
+ (see* :ref:`amdgpu-call-frame-information`\ *) to recover register values
+ that might have been clobbered since the subprogram entry point.*
+
+ .. note::
+
+ Unclear why this operation is defined this way. If the expression is
+ simply using existing variables then it is just a regular expression. It
+ is unclear how the compiler instructs the consumer how to create the saved
+ copies of the variables on entry. Seems only the compiler knows how to do
+ this. If the main purpose is only to read the entry value of a register
+ using CFI then would be better to have an operation that explicitly does
+ just that such as ``DW_OP_LLVM_call_frame_entry_reg``.
+
+.. _amdgpu-location-description-operations:
+
+Location Description Operations
++++++++++++++++++++++++++++++++
+
+Information about the location of program objects is provided by location
+descriptions. Location descriptions specify the storage that holds the program
+objects, and a position within the storage.
+
+A location storage is a linear stream of bits that can hold values. Each
+location storage has a size in bits and can be accessed using a zero-based bit
+offset. The ordering of bits within location storage uses the bit numbering and
+direction conventions that are appropriate to the current language on the target
+architecture.
+
+.. note::
+
+ For AMDGPU bytes are ordered with least significant bytes first, and bits are
+ ordered within bytes with least significant bits first.
+
+There are five kinds of location storage: undefined, memory, register, implicit,
+and composite. Memory and register location storage corresponds to the target
+architecture memory address spaces and registers. Implicit location storage
+corresponds to fixed values that can only be read. Undefined location storage
+indicates no value is available and therefore cannot be read or written.
+Composite location storage allows a mixture of these where some bits come from
+one kind of location storage and some from another kind of location storage.
+
+.. note::
+
+ It may be better to add an implicit pointer location storage kind for
+ ``DW_OP_implicit_pointer`` or ``DW_OP_LLVM_aspace_implicit_pointer``.
+
+Location description stack entries specify a location storage to which they
+refer, and a bit offset relative to the start of the location storage.
+
+General Operations
+##################
+
+1. ``DW_OP_LLVM_offset`` *New*
+
+ ``DW_OP_LLVM_offset`` pops two stack entries. The first must be an integral
+ type value that is treated as a byte displacement D. The second must be a
+ location description L.
+
+ It adds the value of D scaled by 8 (the byte size) to the bit offset of L,
+ and pushes the updated L.
+
+ If the updated bit offset of L is less than 0 or greater than or equal to
+ the size of the location storage specified by L, then the DWARF expression
+ is ill-formed.
+
+2. ``DW_OP_LLVM_offset_uconst`` *New*
+
+ ``DW_OP_LLVM_offset_uconst`` has a single unsigned LEB128 integer operand
+ that is treated as a displacement D.
+
+ It pops one stack entry that must be a location description L. It adds the
+ value of D scaled by 8 (the byte size) to the bit offset of L, and pushes
+ the updated L.
+
+ If the updated bit offset of L is less than 0 or greater than or equal to
+ the size of the location storage specified by L, then the DWARF expression
+ is ill-formed.
+
+ *This operation is supplied specifically to be able to encode more field
+ displacements in two bytes than can be done with* ``DW_OP_lit<n>
+ DW_OP_LLVM_offset``\ *.*
+
+3. ``DW_OP_LLVM_bit_offset`` *New*
+
+ ``DW_OP_LLVM_bit_offset`` pops two stack entries. The first must be an
+ integral type value that is treated as a bit displacement D. The second must
+ be a location description L.
+
+ It adds the value of D to the bit offset of L, and pushes the updated L.
+
+ If the updated bit offset of L is less than 0 or greater than or equal to
+ the size of the location storage specified by L, then the DWARF expression
+ is ill-formed.
+
+4. ``DW_OP_deref``
+
+ The ``DW_OP_deref`` operation pops one stack entry that must be a location
+ description L.
+
+ A value of the bit size of the generic type is retrieved from the location
+ storage specified by L starting at the bit offset specified by L. The
+ retrieved generic type value V is pushed on the stack.
+
+ If any bit of the value is retrieved from the undefined location storage, or
+ the offset of any bit exceeds the size of the location storage specified by
+ L, then the DWARF expression is ill-formed.
+
+ See :ref:`amdgpu-implicit-location-descriptions` for special rules
+ concerning implicit location descriptions created by the
+ ``DW_OP_implicit_pointer`` and ``DW_OP_LLVM_implicit_aspace_pointer``
+ operations.
+
+5. ``DW_OP_deref_size``
+
+ ``DW_OP_deref_size`` has a single 1-byte unsigned integral constant treated
+ as a byte result size S.
+
+ It pops one stack entry that must be a location description L.
+
+ A value of S scaled by 8 (the byte size) bits is retrieved from the location
+ storage specified by L starting at the bit offset specified by L. The value
+ V retrieved is zero-extended to the bit size of the generic type before
+ being pushed onto the stack with the generic type.
+
+ If S is larger than the byte size of the generic type, if any bit of the
+ value is retrieved from the undefined location storage, or if the offset of
+ any bit exceeds the size of the location storage specified by L, then the
+ DWARF expression is ill-formed.
+
+ See :ref:`amdgpu-implicit-location-descriptions` for special rules
+ concerning implicit location descriptions created by the
+ ``DW_OP_implicit_pointer`` and ``DW_OP_LLVM_implicit_aspace_pointer``
+ operations.
+
+6. ``DW_OP_deref_type``
+
+ ``DW_OP_deref_type`` has two operands. The first is a 1-byte unsigned
+ integral constant whose value S is the same as the size of the base type
+ referenced by the second operand. The second operand is an unsigned LEB128
+ integer that represents the offset of a debugging information entry E in the
+ current compilation unit, which must be a ``DW_TAG_base_type`` entry that
+ provides the type of the result value.
+
+ It pops one stack entry that must be a location description L. A value of
+ the bit size S is retrieved from the location storage specified by L
+ starting at the bit offset specified by the L. The retrieved result type
+ value V is pushed on the stack.
+
+ If any bit of the value is retrieved from the undefined location storage, or
+ if the offset of any bit exceeds the size of the specified location storage,
+ then the DWARF expression is ill-formed.
+
+ See :ref:`amdgpu-implicit-location-descriptions` for special rules
+ concerning implicit location descriptions created by the
+ ``DW_OP_implicit_pointer`` and ``DW_OP_LLVM_implicit_aspace_pointer``
+ operations.
+
+ *While the size of the pushed value could be inferred from the base type
+ definition, it is encoded explicitly into the operation so that the
+ operation can be parsed easily without reference to the* ``.debug_info``
+ *section.*
+
+7. ``DW_OP_xderef`` *Deprecated*
+
+ ``DW_OP_xderef`` pops two stack entries. The first must be an integral type
+ value that is treated as an address A. The second must be an integral type
+ value that is treated as an address space identifier AS for those
+ architectures that support multiple address spaces.
+
+ The operation is equivalent to performing ``DW_OP_swap;
+ DW_OP_LLVM_form_aspace_address; DW_OP_deref``. The retrieved generic type
+ value V is left on the stack.
+
+8. ``DW_OP_xderef_size`` *Deprecated*
+
+ ``DW_OP_xderef_size`` has a single 1-byte unsigned integral constant treated
+ as a byte result size S.
+
+ It pops two stack entries. The first must be an integral type value that is
+ treated as an address A. The second must be an integral type value that is
+ treated as an address space identifier AS for those architectures that
+ support multiple address spaces.
+
+ The operation is equivalent to performing ``DW_OP_swap;
+ DW_OP_LLVM_form_aspace_address; DW_OP_deref_size S``. The zero-extended
+ retrieved generic type value V is left on the stack.
+
+9. ``DW_OP_xderef_type`` *Deprecated*
+
+ ``DW_OP_xderef_type`` has two operands. The first is a 1-byte unsigned
+ integral constant S whose value is the same as the size of the base type
+ referenced by the second operand. The second operand is an unsigned LEB128
+ integer R that represents the offset of a debugging information entry E in
+ the current compilation unit, which must be a ``DW_TAG_base_type`` entry
+ that provides the type of the result value.
+
+ It pops two stack entries. The first must be an integral type value that is
+ treated as an address A. The second must be an integral type value that is
+ treated as an address space identifier AS for those architectures that
+ support multiple address spaces.
+
+ The operation is equivalent to performing ``DW_OP_swap;
+ DW_OP_LLVM_form_aspace_address; DW_OP_deref_type S R``. The retrieved result
+ type value V is left on the stack.
+
+10. ``DW_OP_push_object_address``
+
+ ``DW_OP_push_object_address`` pushes the location description L of the
+ object currently being evaluated as part of evaluation of a user presented
+ expression.
+
+ This object may correspond to an independent variable described by its own
+ debugging information entry or it may be a component of an array, structure,
+ or class whose address has been dynamically determined by an earlier step
+ during user expression evaluation.
+
+ *This operator provides explicit functionality (especially for arrays
+ involving descriptions) that is analogous to the implicit push of the base
+ address of a structure prior to evaluation of a
+ ``DW_AT_data_member_location`` to access a data member of a structure.*
+
+11. ``DW_OP_call2, DW_OP_call4, DW_OP_call_ref``
+
+ ``DW_OP_call2``, ``DW_OP_call4``, and ``DW_OP_call_ref`` perform DWARF
+ procedure calls during evaluation of a DWARF expression or location
+ description.
+
+ ``DW_OP_call2`` and ``DW_OP_call4``, have one operand that is a 2- or 4-byte
+ unsigned offset, respectively, of a debugging information entry D in the
+ current compilation unit.
+
+ ``DW_OP_LLVM_call_ref`` has one operand that is a 4-byte unsigned value in
+ the 32-bit DWARF format, or an 8-byte unsigned value in the 64-bit DWARF
+ format, that is treated as an offset of a debugging information entry D in a
+ ``.debug_info`` section, which may be contained in an executable or shared
+ object file other than that containing the operator. For references from one
+ executable or shared object file to another, the relocation must be
+ performed by the consumer.
+
+ *Operand interpretation of* ``DW_OP_call2``\ *,* ``DW_OP_call4``\ *, and*
+ ``DW_OP_call_ref`` *is exactly like that for* ``DW_FORM_ref2``\ *,
+ ``DW_FORM_ref4``\ *, and* ``DW_FORM_ref_addr``\ *, respectively.*
+
+ If D has a ``DW_AT_location`` attribute, then the DWARF expression E
+ corresponding to the current program location is selected.
+
+ .. note::
+
+ To allow ``DW_OP_call*`` to compute the location description for any
+ variable or formal parameter regardless of whether the producer has
+ optimized it to a constant, the following rule could be added:
+
+ .. note::
+
+ If D has a ``DW_AT_const_value`` attribute, then a DWARF expression E
+ consisting a ``DW_OP_implicit_value`` operation with the value of the
+ ``DW_AT_const_value`` attribute is selected.
+
+ This would be consistent with ``DW_OP_implicit_pointer``.
+
+ Alternatively, could deprecate using ``DW_AT_const_value`` for
+ ``DW_TAG_variable`` and ``DW_TAG_formal_parameter`` debugger information
+ entries that are constants and instead use ``DW_AT_location`` with an
+ implicit location description instead, then this rule would not be
+ required.
+
+ Otherwise, an empty expression E is selected.
+
+ If D is a ``DW_TAG_dwarf_procedure`` debugging information entry, then E is
+ evaluated using the same DWARF expression stack. Any existing stack entries
+ may be accessed and/or removed in the evaluation of E, and the evaluation of
+ E may add any new stack entries.
+
+ *Values on the stack at the time of the call may be used as parameters by
+ the called expression and values left on the stack by the called expression
+ may be used as return values by prior agreement between the calling and
+ called expressions.*
+
+ Otherwise, E is evaluated on a separate DWARF stack and the resulting
+ location description L is pushed on the ``DW_OP_call*`` operation's stack.
+
+ .. note:
+
+ In DWARF 5, if D does not have a ``DW_AT_location`` then ``DW_OP_call*``
+ is defined to have no effect. It is unclear that this is the right
+ definition as a producer should be able to rely on using ``DW_OP_call*``
+ to get a location description for any non-\ ``DW_TAG_dwarf_procedure``
+ debugging information entries, and should not be creating DWARF with
+ ``DW_OP_call*`` to a ``DW_TAG_dwarf_procedure`` that does not have a
+ ``DW_AT_location`` attribute.
+
+12. ``DW_OP_LLVM_call_frame_entry_reg`` *New*
+
+ ``DW_OP_LLVM_call_frame_entry_reg`` has a single unsigned LEB128 integer
+ operand that is treated as a target architecture register number R.
+
+ It pushes a location description L that holds the value of register R on
+ entry to the current subprogram as defined by the Call Frame Information
+ (see :ref:`amdgpu-call-frame-information`).
+
+ *If there is no Call Frame Information defined, then the default rules for
+ the target architecture are used. If the register rule is* undefined\ *,
+ then the undefined location description is pushed. If the register rule is*
+ same value\ *, then a register location description for R is pushed.*
+
+Undefined Location Descriptions
+###############################
+
+The undefined location storage represents a piece or all of an object that is
+present in the source but not in the object code (perhaps due to optimization).
+Neither reading or writing to the undefined location storage is meaningful.
+
+An undefined location description specifies the undefined location storage.
+There is no concept of the size of the undefined location storage, nor of a bit
+offset for an undefined location description. The ``DW_OP_LLVM_*offset``
+operations leave an undefined location description unchanged. The
+``DW_OP_*piece`` operations can explicitly or implicitly specify an undefined
+location description, allowing any size and offset to be specified, and results
+in a part with all undefined bits.
+
+1. ``DW_OP_LLVM_undefined`` *New*
+
+ ``DW_OP_LLVM_undefined`` pushes an undefined location description L.
+
+Memory Location Descriptions
+############################
+
+There is a memory location storage that corresponds to each of the target
+architecture linear memory address spaces. The size of each memory location
+storage corresponds to the range of the addresses in the address space.
+
+*It is target architecture defined how address space location storage maps to
+target architecture physical memory. For example, they may be independent memory
+or more than one location storage may alias the same physical memory possibly at
+
diff erent offsets and with
diff erent interleaving. The mapping may also be
+dictated by the source language address classes.*
+
+A memory location description specifies a memory location storage. The bit
+offset corresponds to an address in the address space scaled by 8 (the byte
+size). Bits accessed using a memory location description, access the
+corresponding target architecture memory starting at the bit offset.
+
+``DW_ASPACE_none`` is defined as the target architecture default address space.
+
+*The target architecture default address space for AMDGPU is the global address
+space.*
+
+If a stack entry is required to be a location description, but it is a value
+with the generic type, then it is implicitly convert to a memory location
+description that specifies memory in the target architecture default address
+space with a bit offset equal to the value scaled by 8 (the byte size).
+
+ .. note::
+
+ If want to allow any integral type value to be implicitly converted to a
+ memory location description in the target architecture default address
+ space:
+
+ .. note::
+
+ If a stack entry is required to be a location description, but it is a
+ value with an integral type, then it is implicitly convert to a memory
+ location description. The stack entry value is zero extended to the size
+ of the generic type and the least significant generic type size bits are
+ treated as a twos-complement unsigned value to be used as an address. The
+ converted memory location description specifies memory location storage
+ corresponding to the target architecture default address space with a bit
+ offset equal to the address scaled by 8 (the byte size).
+
+ The implicit conversion could also be defined as target specific. For
+ example, gdb checks if the value is an integral type. If it is not it gives
+ an error. Otherwise, gdb zero-extends the value to 64 bits. If the gdb
+ target defines a hook function then it is called and it can modify the 64
+ bit value, possibly sign extending the original value. Finally, gdb treats
+ the 64 bit value as a memory location address.
+
+If a stack entry is required to be a location description, but it is an implicit
+pointer value IPV with the target architecture default address space, then it is
+implicitly convert to the location description specified by IPV. See
+:ref:`amdgpu-implicit-location-descriptions`.
+
+If a stack entry is required to be a value with a generic type, but it is a
+memory location description in the target architecture default address space
+with a bit offset that is a multiple of 8, then it is implicitly converted to a
+value with a generic type that is equal to the bit offset divided by 8 (the byte
+size).
+
+1. ``DW_OP_addr``
+
+ ``DW_OP_addr`` has a single byte constant value operand, which has the size
+ of the generic type, treated as an address A.
+
+ It pushes a memory location description L on the stack that specifies the
+ memory location storage for the target architecture default address space
+ with a bit offset equal to A scaled by 8 (the byte size).
+
+ *If the DWARF is part of a code object, then A may need to be relocated. For
+ example, in the ELF code object format, A must be adjusted by the
diff erence
+ between the ELF segment virtual address and the virtual address at which the
+ segment is loaded.*
+
+2. ``DW_OP_addrx``
+
+ ``DW_OP_addrx`` has a single unsigned LEB128 integer operand that is treated
+ as a zero-based index into the ``.debug_addr`` section relative to the value
+ of the ``DW_AT_addr_base`` attribute of the associated compilation unit. The
+ address value A in the ``.debug_addr`` section has the size of generic type.
+
+ It pushes a memory location description L on the stack that specifies the
+ memory location storage for the target architecture default address space
+ with a bit offset equal to A scaled by 8 (the byte size).
+
+ *If the DWARF is part of a code object, then A may need to be relocated. For
+ example, in the ELF code object format, A must be adjusted by the
diff erence
+ between the ELF segment virtual address and the virtual address at which the
+ segment is loaded.*
+
+3. ``DW_OP_LLVM_form_aspace_address`` *New*
+
+ ``DW_OP_LLVM_form_aspace_address`` pops top two stack entries. The first
+ must be an integral type value that is treated as an address space
+ identifier AS for those architectures that support multiple address spaces.
+ The second must be an integral type value that is treated as an address A.
+
+ The address size S is defined as the address bit size of the target
+ architecture's address space that corresponds to AS.
+
+ A is adjusted by zero extending it to S bits and the least significant S
+ bits are treated as a twos-complement unsigned value.
+
+ ``DW_OP_LLVM_form_aspace_address`` pushes a memory location description L
+ that specifies the memory location storage that corresponds to AS, with a
+ bit offset equal to the adjusted A scaled by 8 (the byte size).
+
+ If AS is not one of the values defined by the target architecture's
+ ``DW_ASPACE_*`` values, then the DWARF expression is ill-formed.
+
+ See :ref:`amdgpu-implicit-location-descriptions` for special rules
+ concerning implicit pointer values produced by dereferencing implicit
+ location descriptions created by the ``DW_OP_implicit_pointer`` and
+ ``DW_OP_LLVM_implicit_aspace_pointer`` operations.
+
+ The AMDGPU address spaces are defined in
+ :ref:`amdgpu-dwarf-address-space-mapping-table`.
+
+4. ``DW_OP_form_tls_address``
+
+ ``DW_OP_form_tls_address`` pops one stack entry that must be an integral
+ type value, and treats it as a thread-local storage address.
+
+ ``DW_OP_form_tls_address`` pushes a memory location description L for the
+ target architecture default address space that corresponds to the
+ thread-local storage address.
+
+ The meaning of the thread-local storage address is defined by the run-time
+ environment. If the run-time environment supports multiple thread-local
+ storage blocks for a single thread, then the block corresponding to the
+ executable or shared library containing this DWARF expression is used.
+
+ *Some implementations of C, C++, Fortran, and other languages, support a
+ thread-local storage class. Variables with this storage class have distinct
+ values and addresses in distinct threads, much as automatic variables have
+ distinct values and addresses in each function invocation. Typically, there
+ is a single block of storage containing all thread-local variables declared
+ in the main executable, and a separate block for the variables declared in
+ each shared library. Each thread-local variable can then be accessed in its
+ block using an identifier. This identifier is typically an offset into the
+ block and pushed onto the DWARF stack by one of the* ``DW_OP_const<n><x>``
+ *operations prior to the* ``DW_OP_form_tls_address`` *operation. Computing
+ the address of the appropriate block can be complex (in some cases, the
+ compiler emits a function call to do it), and
diff icult to describe using
+ ordinary DWARF location descriptions. Instead of forcing complex
+ thread-local storage calculations into the DWARF expressions, the*
+ ``DW_OP_form_tls_address`` *allows the consumer to perform the computation
+ based on the run-time environment.*
+
+5. ``DW_OP_call_frame_cfa``
+
+ ``DW_OP_call_frame_cfa`` pushes the memory location description L of the
+ Canonical Frame Address (CFA) of the current function, obtained from the
+ Call Frame Information (see :ref:`amdgpu-call-frame-information`).
+
+ *Although the value of* ``DW_AT_frame_base`` *can be computed using other
+ DWARF expression operators, in some cases this would require an extensive
+ location list because the values of the registers used in computing the CFA
+ change during a subroutine. If the Call Frame Information is present, then
+ it already encodes such changes, and it is space efficient to reference
+ that.*
+
+6. ``DW_OP_fbreg``
+
+ ``DW_OP_fbreg`` has a single signed LEB128 integer operand that is treated
+ as a byte displacement D.
+
+ The DWARF expression E corresponding to the current program location is
+ selected from the ``DW_AT_frame_base`` attribute of the current function and
+ evaluated. The resulting memory location description L's bit offset is
+ updated as if the ``DW_OP_LLVM_offset D`` operation were applied. The
+ updated L is pushed.
+
+ *This is typically a stack pointer register plus or minus some offset.*
+
+7. ``DW_OP_breg0, DW_OP_breg1, ..., DW_OP_breg31``
+
+ The ``DW_OP_breg<n>`` operations encode the numbers of up to 32 registers,
+ numbered from 0 through 31, inclusive. The register number R corresponds to
+ the ``n`` in the operation name.
+
+ They have a single signed LEB128 integer operand that is treated as a byte
+ displacement D.
+
+ The address space identifier AS is defined as the one corresponding to the
+ target architecture's default address space.
+
+ The address size S is defined as the address bit size of the target
+ architecture's address space corresponding to AS.
+
+ The contents of the register specified by R is retrieved as a
+ twos-complement unsigned value and zero extended to S bits. D is added and
+ the least significant S bits are treated as a twos-complement unsigned value
+ to be used as an address A.
+
+ They push a memory location description L that specifies the memory location
+ storage that corresponds to AS, with a bit offset equal to A scaled by 8
+ (the byte size).
+
+8. ``DW_OP_bregx``
+
+ ``DW_OP_bregx`` has two operands. The first is an unsigned LEB128 integer
+ that is treated as a register number R. The second is a signed LEB128
+ integer that is treated as a byte displacement D.
+
+ The action is the same as for ``DW_OP_breg<n>`` except that R is used as the
+ register number and D is used as the byte displacement.
+
+9. ``DW_OP_LLVM_aspace_bregx`` *New*
+
+ ``DW_OP_LLVM_aspace_bregx`` has two operands. The first is an unsigned
+ LEB128 integer that is treated as a register number R. The second is a
+ signed LEB128 integer that is treated as a byte displacement D. It pops one
+ stack entry that is required to be an integral type value that is treated as
+ an address space identifier AS for those architectures that support multiple
+ address spaces.
+
+ The action is the same as for ``DW_OP_breg<n>`` except that R is used as the
+ register number, D is used as the byte displacement, and AS is used as the
+ address space identifier.
+
+ If AS is not one of the values defined by the target architecture's
+ ``DW_ASPACE_*`` values, then the DWARF expression is ill-formed.
+
+ .. note::
+
+ Could also consider adding ``DW_OP_aspace_breg0, DW_OP_aspace_breg1, ...,
+ DW_OP_aspace_bref31`` which would save encoding size.
+
+.. _amdgpu-register-location-descriptions:
+
+Register Location Descriptions
+##############################
+
+There is a register location storage that corresponds to each of the target
+architecture registers. The size of each register location storage corresponds
+to the size of the corresponding target architecture register.
+
+A register location description specifies a register location storage. The bit
+offset corresponds to a bit position within the register. Bits accessed using a
+register location description, access the corresponding target architecture
+register starting at the bit offset.
+
+1. ``DW_OP_reg0, DW_OP_reg1, ..., DW_OP_reg31``
+
+ ``DW_OP_reg<n>`` operations encode the numbers of up to 32 registers,
+ numbered from 0 through 31, inclusive. The target architecture register
+ number R corresponds to the ``n`` in the operation name.
+
+ ``DW_OP_reg<n>`` pushes a register location description L that specifies the
+ register location storage that corresponds to R, with a bit offset of 0.
+
+2. ``DW_OP_regx``
+
+ ``DW_OP_regx`` has a single unsigned LEB128 integer operand that is treated
+ as a target architecture register number R.
+
+ ``DW_OP_regx`` pushes a register location description L that specifies the
+ register location storage that corresponds to R, with a bit offset of 0.
+
+*These operations name a register location. To fetch the contents of a register,
+it is necessary to use* ``DW_OP_regval_type``\ *, or one of the register based
+addressing operations such as* ``DW_OP_bregx``\ *, or using* ``DW_OP_deref*``
+*on a register location description.*
+
+.. _amdgpu-implicit-location-descriptions:
+
+Implicit Location Descriptions
+##############################
+
+Implicit location storage represents a piece or all of an object which has no
+actual location in the program but whose contents are nonetheless known, either
+as a constant or can be computed from other locations and values in the program.
+
+An implicit location description specifies an implicit location storage. The bit
+offset corresponds to a bit position within the implicit location storage. Bits
+accessed using an implicit location description, access the corresponding
+implicit storage value starting at the bit offset.
+
+1. ``DW_OP_implicit_value``
+
+ ``DW_OP_implicit_value`` has two operands. The first is an unsigned LEB128
+ integer treated as a byte size S. The second is a block of bytes with a
+ length equal to S treated as a literal value V.
+
+ An implicit location storage LS is created with the literal value V and a
+ size of S. An implicit location description L is pushed that specifies LS
+ with a bit offset of 0.
+
+2. ``DW_OP_stack_value``
+
+ ``DW_OP_stack_value`` pops one stack entry that must be a value treated as a
+ literal value V.
+
+ An implicit location storage LS is created with the literal value V and a
+ size equal to V's base type size. An implicit location description L is
+ pushed that specifies LS with a bit offset of 0.
+
+ The ``DW_OP_stack_value`` operation specifies that the object does not exist
+ in memory but its value is nonetheless known and is at the top of the DWARF
+ expression stack. In this form of location description, the DWARF expression
+ represents the actual value of the object, rather than its location.
+
+ See :ref:`amdgpu-implicit-location-descriptions` for special rules
+ concerning implicit pointer values produced by dereferencing implicit
+ location descriptions created by the ``DW_OP_implicit_pointer`` and
+ ``DW_OP_LLVM_implicit_aspace_pointer`` operations.
+
+ .. note::
+
+ Since location descriptions are allowed on the stack, the
+ ``DW_OP_stack_value`` operation no longer terminates the DWARF expression.
+
+3. ``DW_OP_implicit_pointer``
+
+ *An optimizing compiler may eliminate a pointer, while still retaining the
+ value that the pointer addressed.* ``DW_OP_implicit_pointer`` *allows a
+ producer to describe this value.*
+
+ ``DW_OP_implicit_pointer`` specifies that the object is a pointer to the
+ target architecture default address space that cannot be represented as a
+ real pointer, even though the value it would point to can be described. In
+ this form of location description, the DWARF expression refers to a
+ debugging information entry that represents the actual location description
+ of the object to which the pointer would point. Thus, a consumer of the
+ debug information would be able to access the the dereferenced pointer, even
+ when it cannot access of the pointer itself.
+
+ ``DW_OP_implicit_pointer`` has two operands. The first is a 4-byte unsigned
+ value in the 32-bit DWARF format, or an 8-byte unsigned value in the 64-bit
+ DWARF format, that is treated as a debugging information entry reference R.
+ The second is a signed LEB128 integer that is treated as a byte
+ displacement D.
+
+ R is used as the offset of a debugging information entry E in a
+ ``.debug_info`` section, which may be contained in an executable or shared
+ object file other than that containing the operator. For references from one
+ executable or shared object file to another, the relocation must be
+ performed by the consumer.
+
+ *The first operand interpretation is exactly like that for*
+ ``DW_FORM_ref_addr``\ *.*
+
+ The address space identifier AS is defined as the one corresponding to the
+ target architecture's default address space.
+
+ The address size S is defined as the address bit size of the target
+ architecture's address space corresponding to AS.
+
+ An implicit location storage LS is created that has the bit size of S. An
+ implicit location description L is pushed that specifies LS and has a bit
+ offset of 0.
+
+ If a ``DW_OP_deref*`` operation pops a location description L' and retrieves
+ S' bits where some retrieved bits come from LS such that either:
+
+ 1. L' is an implicit location description that specifies LS with bit offset
+ 0, and S' equals S.
+
+ 2. L' is a complete composite location description that specifies a
+ canonical form composite location storage LS'. The bits retrieved all
+ come from a single part P' of LS'. P' has a bit size of S and has
+ an implicit location description PL'. PL' specifies LS with a bit offset
+ of 0.
+
+ Then the value V pushed by the ``DW_OP_deref*`` operation is an implicit
+ pointer value IPV with an address space of AS, a debugging information entry
+ of E, and a base type of T. If AS is the target architecture default address
+ space, then T is the generic type. Otherwise, T is an architecture specific
+ integral type with a bit size equal to S.
+
+ Otherwise, if a ``DW_OP_deref*`` operation is applied to a location
+ description such that some retrieved bits come from LS, then the DWARF
+ expression is ill-formed.
+
+ If IPV is either implicitly converted to a location description (only done
+ if AS is the target architecture default address space) or used by
+ ``DW_OP_LLVM_form_aspace_address`` (only done if the address space specified
+ is AS), then the resulting location description is:
+
+ * If E has a ``DW_AT_location`` attribute, the DWARF expression
+ corresponding to the current program location is selected and evaluated
+ from the ``DW_AT_location`` attribute. The expression result is the
+ resulting location description RL.
+
+ * If E has a ``DW_AT_const_value`` attribute, then an implicit location
+ storage RLS is created from the ``DW_AT_const_value`` attribute's value,
+ with a size matching the size of the ``DW_AT_const_value`` attribute's
+ value. The resulting implicit location description RL specifies RLS with a
+ bit offset of 0.
+
+ .. note::
+
+ If deprecate using ``DW_AT_const_value`` for variables and formal
+ parameters and instead use ``DW_AT_location`` with an implicit location
+ description instead, then this rule would not be required.
+
+ * Otherwise the DWARF expression is ill-formed.
+
+ The bit offset of RL is updated as if the ``DW_OP_LLVM_offset D`` operation
+ were applied.
+
+ If a ``DW_OP_stack_value`` operation pops a value that is the same as IPV,
+ then it pushes a location description that is the same as L.
+
+ The DWARF expression is ill-formed if it accesses LS or IPV in any other
+ manner.
+
+ *The restrictions on how an implicit pointer location description created by
+ ``DW_OP_implicit_pointer`` and ``DW_OP_LLVM_aspace_implicit_pointer``, or an
+ implicit pointer value created by ``DW_OP_deref*``, can be used are to
+ simplify the DWARF consumer.*
+
+4. ``DW_OP_LLVM_aspace_implicit_pointer`` *New*
+
+ ``DW_OP_LLVM_aspace_implicit_pointer`` has two operands that are the same as
+ for ``DW_OP_implicit_pointer``.
+
+ It pops one stack entry that must be an integral type value that is treated
+ as an address space identifier AS for those architectures that support
+ multiple address spaces.
+
+ The implicit location description L that is pushed is the same as for
+ ``DW_OP_implicit_pointer`` except that the address space identifier used is
+ AS.
+
+ If AS is not one of the values defined by the target architecture's
+ ``DW_ASPACE_*`` values, then the DWARF expression is ill-formed.
+
+*The debugging information entry referenced by a* ``DW_OP_implicit_pointer`` or
+``DW_OP_LLVM_aspace_implicit_pointer`` *operation is typically a*
+``DW_TAG_variable`` *or* ``DW_TAG_formal_parameter`` *entry whose*
+``DW_AT_location`` *attribute gives a second DWARF expression or a location list
+that describes the value of the object, but the referenced entry may be any
+entry that contains a* ``DW_AT_location`` *or* ``DW_AT_const_value`` *attribute
+(for example,* ``DW_TAG_dwarf_procedure``\ *). By using the second DWARF
+expression, a consumer can reconstruct the value of the object when asked to
+dereference the pointer described by the original DWARF expression containing
+the* ``DW_OP_implicit_pointer`` or ``DW_OP_LLVM_aspace_implicit_pointer``
+*operation.*
+
+Composite Location Descriptions
+###############################
+
+A composite location storage represents an object or value which may be
+contained in part of another location storage, or contained in parts of more
+than one location storage.
+
+Each part has a part location description L and a part bit size S. The bits of
+the part comprise S contiguous bits from the location storage specified by L,
+starting at the bit offset specified by L. All the bits must be within the size
+of the location storage specified by L or the DWARF expression is ill-formed.
+
+A composite location storage can have zero or more parts. The parts are
+contiguous such that the zero-based location storage bit index will range over
+each part with no gaps between them. Therefore, the size of a composite location
+storage is the size of its parts. The DWARF expression is ill-formed if the size
+of the contiguous location storage is larger than the size of the memory
+location storage corresponding to the target architecture's largest address
+space.
+
+The canonical form of a composite location storage is computed by applying the
+following steps to a composite location storage:
+
+1. If any part P has a composite location description L, it is replaced by a
+ copy of the parts of the composite location storage specified by L that are
+ selected by the bit size of P starting at the bit offset of L. The location
+ description of the first copied part has its bit offset updated as
+ necessary, and the last copied part has its bit size updated as necessary,
+ to reflect the bits selected by P. This rule is applied repeatedly until no
+ part has a composite location description.
+
+2. If the size on any part is zero, it is removed.
+
+3. If any adjacent parts P\ :sup:`1` to P\ :sup:`n` have location descriptions
+ that specify the same location storage LS such that the bits selected form a
+ contiguous portion of LS, then they are replaced by a single new part P'. P'
+ has a location description L that specifies LS with the same bit offset as
+ P\ :sup:`1`\ 's location description, and a bit size equal to the sum of the
+ bit sizes of P\ :sup:`1` to P\ :sup:`n` inclusive.
+
+A composite location description specifies the canonical form of a composite
+location storage and a bit offset.
+
+There are operations that push a composite location description that specifies a
+composite location storage that is created by the operation.
+
+There are other operations that allow a composite location storage and a
+composite location description that specifies it to be created incrementally.
+Each part is described by a separate operation. There may be one or more
+operations to create the final composite location storage and associated
+description. A series of such operations describes the parts of the composite
+location storage that are in the order that the associated part operations are
+executed.
+
+To support incremental creation, a composite location description can be in an
+incomplete state. When an incremental operation operates on an incomplete
+composite location description, it adds a new part, otherwise it creates a new
+composite location description. The ``DW_OP_LLVM_piece_end`` operation
+explicitly makes an incomplete composite location description complete.
+
+If the top stack entry is an incomplete composite location description after the
+execution of a DWARF expression has completed, it is converted to a complete
+composite location description.
+
+If a stack entry is required to be a location description, but it is an
+incomplete composite location description, then the DWARF expression is
+ill-formed.
+
+*Note that a DWARF expression may arbitrarily compose composite location
+descriptions from any other location description, including other composite
+location descriptions.*
+
+*The incremental composite location description operations are defined to be
+compatible with the definitions in DWARF 5 and earlier.*
+
+1. ``DW_OP_piece``
+
+ ``DW_OP_piece`` has a single unsigned LEB128 integer that is treated as a
+ byte size S.
+
+ The action is based on the context:
+
+ * If the stack is empty, then an incomplete composite location description
+ L is pushed that specifies a new composite location storage LS and has a
+ bit offset of 0. LS has a single part P that specifies the undefined
+ location description, and has a bit size of S scaled by 8 (the byte size).
+
+ * If the top stack entry is an incomplete composite location description L,
+ then the composite location storage LS that it specifies is updated to
+ append a part that specifies an undefined location description, and has a
+ bit size S scaled by 8 (the byte size).
+
+ * If the top stack entry is a location description or can be converted to
+ one, then it is popped and treated as a part location description PL.
+ Then:
+
+ * If the stack is empty or the top stack entry is not an incomplete
+ composite location description, then an incomplete composite location
+ description L is pushed that specifies a new composite location storage
+ LS. LS has a single part that specifies PL, and has a bit size of S
+ scaled by 8 (the byte size).
+
+ * Otherwise, the composite location storage LS specified by the top stack
+ incomplete composite location description L is updated to append a part
+ that specifies PL, and has a bit size S scaled by 8 (the byte size).
+
+ * Otherwise, the DWARF expression is ill-formed
+
+ If LS is not in canonical form it is updated to be in canonical form.
+
+ *Many compilers store a single variable in sets of registers, or store a
+ variable partially in memory and partially in registers.* ``DW_OP_piece``
+ *provides a way of describing how large a part of a variable a particular
+ DWARF location description refers to.*
+
+ *If a computed byte displacement is required, the* ``DW_OP_LLVM_offset``
+ *can be used to update the part location description.*
+
+2. ``DW_OP_bit_piece``
+
+ ``DW_OP_bit_piece`` has two operands. The first is an unsigned LEB128
+ integer that is treated as the part bit size S. The second is an unsigned
+ LEB128 integer that is treated as a bit displacement D.
+
+ The action is the same as for ``DW_OP_piece`` except that any part created
+ has the bit size S, and the location description of any created part has its
+ bit offset updated as if the ``DW_OP_LLVM_bit_offset D`` operation were
+ applied.
+
+ *If a computed bit displacement is required, the* ``DW_OP_LLVM_bit_offset``
+ *can be used to update the part location description.*
+
+ .. note::
+
+ The bit offset operand is not needed as ``DW_OP_LLVM_bit_offset`` can be
+ used on the part's location description.
+
+3. ``DW_OP_LLVM_piece_end`` *New*
+
+ If the top stack entry is an incomplete composite location description L,
+ then it is updated to be a complete composite location description with the
+ same parts. Otherwise, the DWARF expression is ill-formed.
+
+4. ``DW_OP_LLVM_extend`` *New*
+
+ ``DW_OP_LLVM_extend`` has two operands. The first is an unsigned LEB128
+ integer that is treated as the element bit size S. The second is an unsigned
+ LEB128 integer that is treated as a count C.
+
+ It pops one stack entry that must be a location description and is treated
+ as the part location description PL.
+
+ A complete composite location description L is pushed that comprises C parts
+ that each specify PL and have a bit size of S.
+
+ The DWARF expression is ill-formed if the element bit size or count are 0.
+
+5. ``DW_OP_LLVM_select_bit_piece`` *New*
+
+ ``DW_OP_LLVM_select_bit_piece`` has two operands. The first is an unsigned
+ LEB128 integer that is treated as the element bit size S. The second is an
+ unsigned LEB128 integer that is treated as a count C.
+
+ It pops three stack entries. The first must be an integral type value that
+ is treated as a bit mask value M. The second must be a location description
+ that is treated as the one-location description L1. The third must be a
+ location description that is treated as the zero-location description L0.
+
+ A complete composite location description L is pushed that specifies a new
+ composite location storage LS. LS comprises C parts that each specify a part
+ location description PL and have a bit size of S. The PL for part N is
+ defined as:
+
+ 1. If the Nth least significant bit of M is a zero then the PL for part N
+ is the same as L0, otherwise it is the same as L1.
+
+ 2. The PL for part N is updated as if the ``DW_OP_LLVM_bit_offset N*S``
+ operation was applied.
+
+ If LS is not in canonical form it is updated to be in canonical form.
+
+ The DWARF expression is ill-formed if S or C are 0, or if the bit size of M
+ is less than C.
+
+``DW_OP_bit_piece`` *is used instead of* ``DW_OP_piece`` *when the piece to be
+assembled into a value or assigned to is not byte-sized or is not at the start
+of the part location description.*
+
+.. note::
+
+ For AMDGPU:
+
+ * In CFI expressions ``DW_OP_LLVM_select_bit_piece`` is used to describe
+ unwinding vector registers that are spilled under the execution mask to
+ memory: the zero location description is the vector register, and the one
+ location description is the spilled memory location. The
+ ``DW_OP_LLVM_form_aspace_address`` is used to specify the address space of
+ the memory location description.
+
+ * ``DW_OP_LLVM_select_bit_piece`` is used by the ``lane_pc`` attribute
+ expression where divergent control flow is controlled by the execution mask.
+ An undefined location description together with ``DW_OP_LLVM_extend`` is
+ used to indicate the lane was not active on entry to the subprogram.
+
+Expression Operation Encodings
+++++++++++++++++++++++++++++++
+
+The following table gives the encoding of the DWARF expression operations added
+for AMDGPU.
+
+.. table:: AMDGPU DWARF Expression Operation Encodings
+ :name: amdgpu-dwarf-expression-operation-encodings-table
+
+ ================================== ===== ======== ===============================
+ Operation Code Number Notes
+ of
+ Operands
+ ================================== ===== ======== ===============================
+ DW_OP_LLVM_form_aspace_address 0xe7 0
+ DW_OP_LLVM_push_lane 0xea 0
+ DW_OP_LLVM_offset 0xe9 0
+ DW_OP_LLVM_offset_uconst *TBD* 1 ULEB128 byte displacement
+ DW_OP_LLVM_bit_offset *TBD* 0
+ DW_OP_LLVM_call_frame_entry_reg *TBD* 1 ULEB128 register number
+ DW_OP_LLVM_undefined *TBD* 0
+ DW_OP_LLVM_aspace_bregx *TBD* 2 ULEB128 register number,
+ ULEB128 byte displacement
+ DW_OP_LLVM_aspace_implicit_pointer *TBD* 2 4- or 8-byte offset of DIE,
+ SLEB128 byte displacement
+ DW_OP_LLVM_piece_end *TBD* 0
+ DW_OP_LLVM_extend *TBD* 2 ULEB128 bit size,
+ ULEB128 count
+ DW_OP_LLVM_select_bit_piece *TBD* 2 ULEB128 bit size,
+ ULEB128 count
+ ================================== ===== ======== ===============================
+
+.. _amdgpu-dwarf-debugging-information-entry-attributes:
+
+Debugging Information Entry Attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section provides changes to existing debugger information attributes and
+defines attributes added by the AMDGPU target.
+
+1. ``DW_AT_location``
+
+ If the result of the ``DW_AT_location`` DWARF expression is required to be a
+ location description, then it may have any kind of location description (see
+ :ref:`amdgpu-location-description-operations`).
+
+2. ``DW_AT_const_value``
+
+ .. note::
+
+ Could deprecate using the ``DW_AT_const_value`` attribute for
+ ``DW_TAG_variable`` or ``DW_TAG_formal_parameter`` debugger information
+ entries that are constants. Instead, ``DW_AT_location`` could be used with
+ a DWARF expression that produces an implicit location description now that
+ any location description can be used within a DWARF expression. This
+ allows the ``DW_OP_call*`` operations to be used to push the location
+ description of any variable regardless of how it is optimized.
+
+3. ``DW_AT_frame_base``
+
+ A ``DW_TAG_subprogram`` or ``DW_TAG_entry_point`` debugger information entry
+ may have a ``DW_AT_frame_base`` attribute, whose value is a DWARF expression
+ or location list that describes the *frame base* for the subroutine or entry
+ point.
+
+ If the result of the DWARF expression is a register location description,
+ then the ``DW_OP_deref`` operation is applied to compute the frame base
+ memory location description in the target architecture default address
+ space.
+
+ .. note::
+
+ This rule could be removed and require the producer to create the
+ required location descriptor directly using ``DW_OP_call_frame_cfa``,
+ ``DW_OP_fbreg``, ``DW_OP_breg*``, or ``DW_OP_LLVM-aspace_bregx``. This
+ would also then allow a target to implement the call frames withing a
+ large register.
+
+ Otherwise, the result of the DWARF expression is required to be a memory
+ location description in any of the target architecture address spaces which
+ is the frame base.
+
+4. ``DW_AT_data_member_location``
+
+ For a ``DW_AT_data_member_location`` attribute there are two cases:
+
+ 1. If the value is an integer constant, it is the offset in bytes from the
+ beginning of the containing entity. If the beginning of the containing
+ entity has a non-zero bit offset then the beginning of the member entry
+ has that same bit offset as well.
+
+ 2. Otherwise, the value must be a DWARF expression or location list. The
+ DWARF expression E corresponding to the current program location is
+ selected. The location description of the beginning of the containing
+ entity is pushed on the DWARF stack before E is evaluated. The result of
+ the evaluation is the location description of the base of the member
+ entry.
+
+ .. note::
+
+ The beginning of the containing entity can now be any location
+ description and can be bit aligned.
+
+5. ``DW_AT_use_location``
+
+ The ``DW_TAG_ptr_to_member_type`` debugging information entry has a
+ ``DW_AT_use_location`` attribute whose value is a DWARF expression or
+ location list. The DWARF expression E corresponding to the current program
+ location is selected. It is used to computes the location description of the
+ member of the class to which the pointer to member entry points
+
+ *The method used to find the location description of a given member of a
+ class or structure is common to any instance of that class or structure and
+ to any instance of the pointer or member type. The method is thus associated
+ with the type entry, rather than with each instance of the type.*
+
+ The ``DW_AT_use_location`` description is used in conjunction with the
+ location descriptions for a particular object of the given pointer to member
+ type and for a particular structure or class instance.
+
+ Two values are pushed onto the DWARF expression stack before E is evaluated.
+ The first value pushed is the value of the pointer to member object itself.
+ The second value pushed is the location description of the base of the
+ entire structure or union instance containing the member whose address is
+ being calculated.
+
+6. ``DW_AT_data_location``
+
+ The ``DW_AT_data_location`` attribute may be used with any type that
+ provides one or more levels of hidden indirection and/or run-time parameters
+ in its representation. Its value is a DWARF expression E which computes the
+ location description of the data for an object. When this attribute is
+ omitted, the location description of the data is the same as the location
+ description of the object.
+
+ *E will typically begin with ``DW_OP_push_object_address`` which loads the
+ location description of the object which can then serve as a descriptor in
+ subsequent calculation.*
+
+7. ``DW_AT_vtable_elem_location``
+
+ An entry for a virtual function also has a ``DW_AT_vtable_elem_location``
+ attribute whose value is a DWARF expression or location list. The DWARF
+ expression E corresponding to the current program location is selected. The
+ location description of the object of the enclosing type is pushed onto the
+ expression stack before E is evaluated. The resulting location description
+ is the slot for the function within the virtual function table for the
+ enclosing class.
+
+8. ``DW_AT_static_link``
+
+ If a ``DW_TAG_subprogram`` or ``DW_TAG_entry_point`` debugger information
+ entry is nested, it may have a ``DW_AT_static_link`` attribute, whose value
+ is a DWARF expression or location list. The DWARF expression E corresponding
+ to the current program location is selected. The result of evaluating E is
+ the frame base memory location description of the relevant instance of the
+ subroutine that immediately encloses the subroutine or entry point.
+
+9. ``DW_AT_return_addr``
+
+ A ``DW_TAG_subprogram``, ``DW_TAG_inlined_subroutine``, or
+ ``DW_TAG_entry_point`` debugger information entry may have a
+ ``DW_AT_return_addr`` attribute, whose value is a DWARF expression or
+ location list. The DWARF expression E corresponding to the current program
+ location is selected. The result of evaluating E is the location description
+ for the place where the return address for the subroutine or entry point is
+ stored.
+
+ .. note::
+
+ It is unclear why ``DW_TAG_inlined_subroutine`` has a
+ ``DW_AT_return_addr`` attribute but not a ``DW_AT_frame_base`` or
+ ``DW_AT_static_link`` attribute. Seems it would either have all of them or
+ none. Since inlined subprograms do not have a frame it seems they would
+ have none of these attributes.
+
+10. ``DW_AT_LLVM_lanes`` *New*
+
+ For languages that are implemented using a SIMD or SIMT execution model, a
+ ``DW_TAG_subprogram``, ``DW_TAG_inlined_subroutine``, or
+ ``DW_TAG_entry_point`` debugger information entry may have a
+ ``DW_AT_LLVM_lanes`` attribute whose value is an integer constant that is
+ the number of lanes per thread.
+
+ If not present, the default value of 1 is used.
+
+ The DWARF is ill-formed if the value is 0.
+
+11. ``DW_AT_LLVM_lane_pc`` *New*
+
+ For languages that are implemented using a SIMD or SIMT execution model, a
+ ``DW_TAG_subprogram``, ``DW_TAG_inlined_subroutine``, or
+ ``DW_TAG_entry_point`` debugging information entry may have a
+ ``DW_AT_LLVM_lane_pc`` attribute whose value is a DWARF expression or
+ location list. The DWARF expression E corresponding to the current program
+ location is selected. The result of evaluating E is a location description
+ that references a wave size vector of generic type elements. Each element
+ holds the conceptual program location of the corresponding lane, where the
+ least significant element corresponds to the first target architecture lane
+ identifier and so forth. If the lane was not active when the subprogram was
+ called, its element is an undefined location description.
+
+ *``DW_AT_LLVM_lane_pc`` allows the compiler to indicate conceptually where
+ each lane of a SIMT thread is positioned even when it is in divergent
+ control flow that is not active.*
+
+ If not present, the thread is not being used in a SIMT manner, and the
+ thread's program location is used.
+
+ *See* :ref:`amdgpu-dwarf-amdgpu-dw-at-llvm-lane-pc` *for AMDGPU
+ information.*
+
+12. ``DW_AT_LLVM_active_lane`` *New*
+
+ For languages that are implemented using a SIMD or SIMT execution model, a
+ ``DW_TAG_subprogram``, ``DW_TAG_inlined_subroutine``, or
+ ``DW_TAG_entry_point`` debugger information entry may have a
+ ``DW_AT_LLVM_active_lane`` attribute whose value is a DWARF expression or
+ location list. The DWARF expression E corresponding to the current program
+ location is selected. The result of evaluating E is a integral value that is
+ the mask of active lanes for the current program location. The Nth least
+ significant bit of the mask corresponds to the Nth lane. If the bit is 1 the
+ lane is active, otherwise it is inactive.
+
+ *Some targets may update the target architecture execution mask for regions
+ of code that must execute with
diff erent sets of lanes than the current
+ active lanes. For example, some code must execute in whole wave mode.
+ ``DW_AT_LLVM_active_lane` allows the compiler can provide the means to
+ determine the actual active lanes.*
+
+ If not present and ``DW_AT_LLVM_lanes`` is greater than 1, then the target
+ architecture execution mask is used.
+
+ *See* :ref:`amdgpu-dwarf-amdgpu-dw-at-llvm-active-lane` *for AMDGPU
+ information.*
+
+13. ``DW_AT_LLVM_vector_size`` *New*
+
+ A base type V may have the ``DW_AT_LLVM_vector_size`` attribute whose value
+ is an integer constant that is the vector size S.
+
+ The representation of a vector base type is as S contiguous elements, each
+ one having the representation of a base type E that is the same as V without
+ the ``DW_AT_LLVM_vector_size`` attribute.
+
+ If not present, the base type is not a vector.
+
+ The DWARF is ill-formed if S not greater than 0.
+
+ .. note::
+
+ LLVM has mention of non-upstreamed debugger information entry that is
+ intended to support vector types. However, that was not for a base type
+ so would not be suitable as the type of a stack value entry. But perhaps
+ that could be replaced by using this attribute.
+
+14. ``DW_AT_LLVM_augmentation`` *New*
+
+ A compilation unit may have a ``DW_AT_LLVM_augmentation`` attribute, whose
+ value is an augmentation string.
+
+ *The augmentation string allows users to indicate that there is additional
+ target-specific information in the debugging information entries. For
+ example, this might be information about the version of target-specific
+ extensions that are being used.*
+
+ If not present, or if the string is empty, then the compilation unit has no
+ augmentation string.
+
+ .. note::
+
+ For AMDGPU, the augmentation string contains:
+
+ ::
+
+ [amd:v0.0]
+
+ The "vX.Y" specifies the major X and minor Y version number of the AMDGPU
+ extensions used in the DWARF of the compilation unit. The version number
+ conforms to [SEMVER]_.
+
+Attribute Encodings
++++++++++++++++++++
+
+The following table gives the encoding of the debugging information entry
+attributes added for AMDGPU.
+
+.. table:: AMDGPU DWARF Attribute Encodings
+ :name: amdgpu-dwarf-attribute-encodings-table
+
+ ================================== ===== ====================================
+ Attribute Name Value Classes
+ ================================== ===== ====================================
+ DW_AT_LLVM_lanes constant
+ DW_AT_LLVM_lane_pc exprloc, loclist
+ DW_AT_LLVM_active_lane exprloc, loclist
+ DW_AT_LLVM_vector_size constant
+ DW_AT_LLVM_augmentation string
+ ================================== ===== ====================================
+
+.. _amdgpu-call-frame-information:
+
+Call Frame Information
+~~~~~~~~~~~~~~~~~~~~~~
+
+DWARF Call Frame Information describes how an agent can virtually *unwind*
+call frames in a running process or core dump.
+
+.. note::
+
+ AMDGPU conforms to the DWARF standard with additional support added for
+ address spaces. Register unwind DWARF expressions are generalized to allow any
+ location description, including composite and implicit location descriptions.
+
+Structure of Call Frame Information
++++++++++++++++++++++++++++++++++++
+
+The register rules are:
+
+*undefined*
+ A register that has this rule has no recoverable value in the previous frame.
+ (By convention, it is not preserved by a callee.)
+
+*same value*
+ This register has not been modified from the previous frame. (By convention,
+ it is preserved by the callee, but the callee has not modified it.)
+
+*offset(N)*
+ The previous value of this register is saved at the location description
+ computed as if the ``DW_OP_LLVM_offset N`` operation is applied to the current
+ CFA memory location description where N is a signed byte offset.
+
+*val_offset(N)*
+ The previous value of this register is the address in the address space of the
+ memory location description computed as if the ``DW_OP_LLVM_offset N``
+ operation is applied to the current CFA memory location description where N is
+ a signed byte displacement.
+
+ If the register size does not match the size of an address in the address
+ space of the current CFA memory location description, then the DWARF is
+ ill-formed .
+
+*register(R)*
+ The previous value of this register is stored in another register numbered R.
+
+ If the register sizes do not match, then the DWARF is ill-formed.
+
+*expression(E)*
+ The previous value of this register is located at the location description
+ produced by executing the DWARF expression E (see
+ :ref:`amdgpu-dwarf-expressions`).
+
+*val_expression(E)*
+ The previous value of this register is the value produced by executing the
+ DWARF expression E (see :ref:`amdgpu-dwarf-expressions`).
+
+ If value type size does not match the register size, then the DWARF is
+ ill-formed.
+
+*architectural*
+ The rule is defined externally to this specification by the augmenter.
+
+A Common Information Entry holds information that is shared among many Frame
+Description Entries. There is at least one CIE in every non-empty
+``.debug_frame`` section. A CIE contains the following fields, in order:
+
+1. ``length`` (initial length)
+
+ A constant that gives the number of bytes of the CIE structure, not
+ including the length field itself. The size of the length field plus the
+ value of length must be an integral multiple of the address size specified
+ in the ``address_size`` field.
+
+2. ``CIE_id`` (4 or 8 bytes, see
+ :ref:`amdgpu-dwarf-32-bit-and-64-bit-dwarf-formats`)
+
+ A constant that is used to distinguish CIEs from FDEs.
+
+ In the 32-bit DWARF format, the value of the CIE id in the CIE header is
+ 0xffffffff; in the 64-bit DWARF format, the value is 0xffffffffffffffff.
+
+3. ``version`` (ubyte)
+
+ A version number. This number is specific to the call frame information and
+ is independent of the DWARF version number.
+
+ The value of the CIE version number is 4.
+
+4. ``augmentation`` (sequence of UTF-8 characters)
+
+ A null-terminated UTF-8 string that identifies the augmentation to this CIE
+ or to the FDEs that use it. If a reader encounters an augmentation string
+ that is unexpected, then only the following fields can be read:
+
+ * CIE: length, CIE_id, version, augmentation
+ * FDE: length, CIE_pointer, initial_location, address_range
+
+ If there is no augmentation, this value is a zero byte.
+
+ *The augmentation string allows users to indicate that there is additional
+ target-specific information in the CIE or FDE which is needed to virtually
+ unwind a stack frame. For example, this might be information about
+ dynamically allocated data which needs to be freed on exit from the
+ routine.*
+
+ *Because the .debug_frame section is useful independently of any
+ ``.debug_info`` section, the augmentation string always uses UTF-8
+ encoding.*
+
+ .. note::
+
+ For AMDGPU, the augmentation string contains:
+
+ ::
+
+ [amd:v0.0]
+
+ The "vX.Y" specifies the major X and minor Y version number of the AMDGPU
+ extensions used in the DWARF of the compilation unit. The version number
+ conforms to [SEMVER]_.
+
+5. ``address_size`` (ubyte)
+
+ The size of a target address in this CIE and any FDEs that use it, in bytes.
+ If a compilation unit exists for this frame, its address size must match the
+ address size here.
+
+ .. note::
+
+ For AMDGPU:
+
+ * The address size for the ``Global`` address space defined in
+ :ref:`amdgpu-dwarf-address-space-mapping-table`.
+
+6. ``segment_selector_size`` (ubyte)
+
+ The size of a segment selector in this CIE and any FDEs that use it, in
+ bytes.
+
+ .. note::
+
+ For AMDGPU:
+
+ * Does not use a segment selector so this is 0.
+
+7. ``code_alignment_factor`` (unsigned LEB128)
+
+ A constant that is factored out of all advance location instructions (see
+ :ref:`amdgpu-dwarf-row-creation-instructions`). The resulting value is
+ ``(operand * code_alignment_factor)``.
+
+ .. note::
+
+ For AMDGPU:
+
+ * 4 bytes.
+
+ .. TODO::
+
+ Add to :ref:`amdgpu-processor-table` table.
+
+8. ``data_alignment_factor`` (signed LEB128)
+
+ A constant that is factored out of certain offset instructions (see
+ :ref:`amdgpu-dwarf-cfa-definition-instructions` and
+ :ref:`amdgpu-dwarf-register-rule-instructions`). The resulting value is
+ ``(operand * data_alignment_factor)``.
+
+ .. note::
+
+ For AMDGPU:
+
+ * 4 bytes.
+
+ .. TODO::
+
+ Add to :ref:`amdgpu-processor-table` table.
+
+9. ``return_address_register`` (unsigned LEB128)
+
+ An unsigned LEB128 constant that indicates which column in the rule table
+ represents the return address of the function. Note that this column might
+ not correspond to an actual machine register.
+
+ .. note::
+
+ For AMDGPU:
+
+ * ``PC_32`` for 32-bit processes and ``PC_64`` for
+ 64-bit processes defined in :ref:`amdgpu-dwarf-register-mapping`.
+
+10. ``initial_instructions`` (array of ubyte)
+
+ A sequence of rules that are interpreted to create the initial setting of
+ each column in the table.
+
+ The default rule for all columns before interpretation of the initial
+ instructions is the undefined rule. However, an ABI authoring body or a
+ compilation system authoring body may specify an alternate default value for
+ any or all columns.
+
+ .. note::
+
+ For AMDGPU:
+
+ * Since a subprogram A with fewer registers can be called from subprogram
+ B that has more allocated, A will not change any of the extra registers
+ as it cannot access them. Therefore, The default rule for all columns is
+ ``same value``.
+
+11. ``padding`` (array of ubyte)
+
+ Enough ``DW_CFA_nop`` instructions to make the size of this entry match the
+ length value above.
+
+An FDE contains the following fields, in order:
+
+1. ``length`` (initial length)
+
+ A constant that gives the number of bytes of the header and instruction
+ stream for this function, not including the length field itself. The size of
+ the length field plus the value of length must be an integral multiple of
+ the address size.
+
+2. ``CIE_pointer`` (4 or 8 bytes, see
+ :ref:`amdgpu-dwarf-32-bit-and-64-bit-dwarf-formats`)
+
+ A constant offset into the ``.debug_frame`` section that denotes the CIE
+ that is associated with this FDE.
+
+3. ``initial_location`` (segment selector and target address)
+
+ The address of the first location associated with this table entry. If the
+ segment_selector_size field of this FDE’s CIE is non-zero, the initial
+ location is preceded by a segment selector of the given length.
+
+4. ``address_range`` (target address)
+
+ The number of bytes of program instructions described by this entry.
+
+5. ``instructions`` (array of ubyte)
+
+ A sequence of table defining instructions that are described in
+ :ref:`amdgpu-dwarf-call-frame-instructions`.
+
+6. ``padding`` (array of ubyte)
+
+ Enough ``DW_CFA_nop`` instructions to make the size of this entry match the
+ length value above.
+
+.. _amdgpu-dwarf-call-frame-instructions:
+
+Call Frame Instructions
++++++++++++++++++++++++
+
+Some call frame instructions have operands that are encoded as DWARF expressions
+E (see :ref:`amdgpu-dwarf-expressions`). The DWARF operators that can be used in
+E have the following restrictions:
+
+* ``DW_OP_addrx``, ``DW_OP_call2``, ``DW_OP_call4``, ``DW_OP_call_ref``,
+ ``DW_OP_const_type``, ``DW_OP_constx``, ``DW_OP_convert``,
+ ``DW_OP_deref_type``, ``DW_OP_regval_type``, and ``DW_OP_reinterpret``
+ operators are not allowed because the call frame information must not depend
+ on other debug sections.
+
+* ``DW_OP_push_object_address`` is not allowed because there is no object
+ context to provide a value to push.
+
+* ``DW_OP_call_frame_cfa`` and ``DW_OP_entry_value`` are not allowed because
+ their use would be circular.
+
+* ``DW_OP_LLVM_call_frame_entry_reg`` is not allowed if evaluating E causes a
+ circular dependency between ``DW_OP_LLVM_call_frame_entry_reg`` operators.
+
+ *For example, if a register R1 has a* ``DW_CFA_def_cfa_expression``
+ *instruction that evaluates a* ``DW_OP_LLVM_call_frame_entry_reg`` *operator
+ that specifies register R2, and register R2 has a*
+ ``DW_CFA_def_cfa_expression`` *instruction that that evaluates a*
+ ``DW_OP_LLVM_call_frame_entry_reg`` *operator that specifies register R1.*
+
+*Call frame instructions to which these restrictions apply include*
+``DW_CFA_def_cfa_expression``\ *,* ``DW_CFA_expression``\ *, and*
+``DW_CFA_val_expression``\ *.*
+
+.. _amdgpu-dwarf-row-creation-instructions:
+
+Row Creation Instructions
+#########################
+
+These instructions are the same as in DWARF 5.
+
+.. _amdgpu-dwarf-cfa-definition-instructions:
+
+CFA Definition Instructions
+###########################
+
+1. ``DW_CFA_def_cfa``
+
+ The ``DW_CFA_def_cfa`` instruction takes two unsigned LEB128 operands
+ representing a register number R and a (non-factored) byte displacement D.
+ The required action is to define the current CFA rule to be the memory
+ location description that is the result of evaluating the DWARF expression
+ ``DW_OP_bregx R, D``.
+
+ .. note::
+
+ Could also consider adding ``DW_CFA_def_aspace_cfa`` and
+ ``DW_CFA_def_aspace_cfa_sf`` which allow a register R, offset D, and
+ address space AS to be specified. For example, that would save a byte of
+ encoding over using ``DW_CFA_def_cfa R, D; DW_CFA_LLVM_def_cfa_aspace
+ AS;``.
+
+2. ``DW_CFA_def_cfa_sf``
+
+ The ``DW_CFA_def_cfa_sf`` instruction takes two operands: an unsigned LEB128
+ value representing a register number R and a signed LEB128 factored byte
+ displacement D. The required action is to define the current CFA rule to be
+ the memory location description that is the result of evaluating the DWARF
+ expression ``DW_OP_bregx R, D*data_alignment_factor``.
+
+ *The action is the same as ``DW_CFA_def_cfa`` except that the second operand
+ is signed and factored.*
+
+3. ``DW_CFA_def_cfa_register``
+
+ The ``DW_CFA_def_cfa_register`` instruction takes a single unsigned LEB128
+ operand representing a register number R. The required action is to define
+ the current CFA rule to be the memory location description that is the
+ result of evaluating the DWARF expression ``DW_OP_constu AS;
+ DW_OP_aspace_bregx R, D`` where D and AS are the old CFA byte displacement
+ and address space respectively.
+
+ If the subprogram has no current CFA rule, or the rule was defined by a
+ ``DW_CFA_def_cfa_expression`` instruction, then the DWARF is ill-formed.
+
+4. ``DW_CFA_def_cfa_offset``
+
+ The ``DW_CFA_def_cfa_offset`` instruction takes a single unsigned LEB128
+ operand representing a (non-factored) byte displacement D. The required
+ action is to define the current CFA rule to be the memory location
+ description that is the result of evaluating the DWARF expression
+ ``DW_OP_constu AS; DW_OP_aspace_bregx R, D`` where R and AS are the old CFA
+ register number and address space respectively.
+
+ If the subprogram has no current CFA rule, or the rule was defined by a
+ ``DW_CFA_def_cfa_expression`` instruction, then the DWARF is ill-formed.
+
+5. ``DW_CFA_def_cfa_offset_sf``
+
+ The ``DW_CFA_def_cfa_offset_sf`` instruction takes a signed LEB128 operand
+ representing a factored byte displacement D. The required action is to
+ define the current CFA rule to be the memory location description that is
+ the result of evaluating the DWARF expression ``DW_OP_constu AS;
+ DW_OP_aspace_bregx R, D*data_alignment_factor`` where R and AS are the old
+ CFA register number and address space respectively.
+
+ If the subprogram has no current CFA rule, or the rule was defined by a
+ ``DW_CFA_def_cfa_expression`` instruction, then the DWARF is ill-formed.
+
+ *The action is the same as ``DW_CFA_def_cfa_offset`` except that the operand
+ is signed and factored.*
+
+6. ``DW_CFA_LLVM_def_cfa_aspace`` *New*
+
+ The ``DW_CFA_LLVM_def_cfa_aspace`` instruction takes a single unsigned
+ LEB128 operand representing an address space identifier AS for those
+ architectures that support multiple address spaces. The required action is
+ to define the current CFA rule to be the memory location description L that
+ is the result of evaluating the DWARF expression ``DW_OP_constu AS;
+ DW_OP_aspace_bregx R, D`` where R and D are the old CFA register number and
+ byte displacement respectively.
+
+ If AS is not one of the values defined by the target architecture's
+ ``DW_ASPACE_*`` values then the DWARF expression is ill-formed.
+
+7. ``DW_CFA_def_cfa_expression``
+
+ The ``DW_CFA_def_cfa_expression`` instruction takes a single operand encoded
+ as a ``DW_FORM_exprloc`` value representing a DWARF expression E. The
+ required action is to define the current CFA rule to be the memory location
+ description computed by evaluating E.
+
+ *See :ref:`amdgpu-dwarf-call-frame-instructions` regarding restrictions on
+ the DWARF expression operators that can be used in E.*
+
+ If the result of evaluating E is not a memory location description with bit
+ offset that is a multiple of 8 (the byte size), then the DWARF is
+ ill-formed.
+
+.. _amdgpu-dwarf-register-rule-instructions:
+
+Register Rule Instructions
+##########################
+
+.. note::
+
+ For AMDGPU:
+
+ * The register number follows the numbering defined in
+ :ref:`amdgpu-dwarf-register-mapping`.
+
+1. ``DW_CFA_undefined``
+
+ The ``DW_CFA_undefined`` instruction takes a single unsigned LEB128 operand
+ that represents a register number R. The required action is to set the rule
+ for the register specified by R to ``undefined``.
+
+2. ``DW_CFA_same_value``
+
+ The ``DW_CFA_same_value`` instruction takes a single unsigned LEB128 operand
+ that represents a register number R. The required action is to set the rule
+ for the register specified by R to ``same value``.
+
+3. ``DW_CFA_offset``
+
+ The ``DW_CFA_offset`` instruction takes two operands: a register number R
+ (encoded with the opcode) and an unsigned LEB128 constant representing a
+ factored displacement D. The required action is to change the rule for the
+ register specified by R to be an *offset(D*data_alignment_factor)* rule.
+
+ .. note::
+
+ Seems this should be named ``DW_CFA_offset_uf`` since the offset is
+ unsigned factored.
+
+4. ``DW_CFA_offset_extended``
+
+ The ``DW_CFA_offset_extended`` instruction takes two unsigned LEB128
+ operands representing a register number R and a factored displacement D.
+ This instruction is identical to ``DW_CFA_offset`` except for the encoding
+ and size of the register operand.
+
+ .. note::
+
+ Seems this should be named ``DW_CFA_offset_extended_uf`` since the
+ displacement is unsigned factored.
+
+5. ``DW_CFA_offset_extended_sf``
+
+ The ``DW_CFA_offset_extended_sf`` instruction takes two operands: an
+ unsigned LEB128 value representing a register number R and a signed LEB128
+ factored displacement D. This instruction is identical to
+ ``DW_CFA_offset_extended`` except that D is signed.
+
+6. ``DW_CFA_val_offset``
+
+ The ``DW_CFA_val_offset`` instruction takes two unsigned LEB128 operands
+ representing a register number R and a factored displacement D. The required
+ action is to change the rule for the register indicated by R to be a
+ *val_offset(D*data_alignment_factor)* rule.
+
+ .. note::
+
+ Seems this should be named ``DW_CFA_val_offset_uf`` since the displacement
+ is unsigned factored.
+
+7. ``DW_CFA_val_offset_sf``
+
+ The ``DW_CFA_val_offset_sf`` instruction takes two operands: an unsigned
+ LEB128 value representing a register number R and a signed LEB128 factored
+ displacement D. This instruction is identical to ``DW_CFA_val_offset``
+ except that D is signed.
+
+8. ``DW_CFA_register``
+
+ The ``DW_CFA_register`` instruction takes two unsigned LEB128 operands
+ representing register numbers R1 and R2 respectively. The required action is
+ to set the rule for the register specified by R1 to be *register(R)* where R
+ is R2.
+
+9. ``DW_CFA_expression``
+
+ The ``DW_CFA_expression`` instruction takes two operands: an unsigned LEB128
+ value representing a register number R, and a ``DW_FORM_block`` value
+ representing a DWARF expression E. The required action is to change the rule
+ for the register specified by R to be an *expression(E)* rule. The memory
+ location description of the current CFA is pushed on the DWARF stack prior
+ to execution of E.
+
+ *That is, the DWARF expression computes the location description where the
+ register value can be retrieved.*
+
+ *See :ref:`amdgpu-dwarf-call-frame-instructions` regarding restrictions on
+ the DWARF expression operators that can be used in E.*
+
+10. ``DW_CFA_val_expression``
+
+ The ``DW_CFA_val_expression`` instruction takes two operands: an unsigned
+ LEB128 value representing a register number R, and a ``DW_FORM_block`` value
+ representing a DWARF expression E. The required action is to change the rule
+ for the register specified by R to be a *val_expression(E)* rule. The memory
+ location description of the current CFA is pushed on the DWARF evaluation
+ stack prior to execution of E.
+
+ *That is, E computes the value of register R.*
+
+ *See :ref:`amdgpu-dwarf-call-frame-instructions` regarding restrictions on
+ the DWARF expression operators that can be used in E.*
+
+ If the result of evaluating E is not a value with a base type size that
+ matches the register size, then the DWARF is ill-formed.
+
+11. ``DW_CFA_restore``
+
+ The ``DW_CFA_restore`` instruction takes a single operand (encoded with the
+ opcode) that represents a register number R. The required action is to
+ change the rule for the register specified by R to the rule assigned it by
+ the initial_instructions in the CIE.
+
+12. ``DW_CFA_restore_extended``
+
+ The ``DW_CFA_restore_extended`` instruction takes a single unsigned LEB128
+ operand that represents a register number R. This instruction is identical
+ to ``DW_CFA_restore`` except for the encoding and size of the register
+ operand.
+
+Row State Instructions
+######################
+
+These instructions are the same as in DWARF 5.
+
+Call Frame Calling Address
+++++++++++++++++++++++++++
+
+*When virtually unwinding frames, consumers frequently wish to obtain the
+address of the instruction which called a subroutine. This information is not
+always provided. Typically, however, one of the registers in the virtual unwind
+table is the Return Address.*
+
+If a Return Address register is defined in the virtual unwind table, and its
+rule is undefined (for example, by ``DW_CFA_undefined``), then there is no
+return address and no call address, and the virtual unwind of stack activations
+is complete.
+
+*In most cases the return address is in the same context as the calling address,
+but that need not be the case, especially if the producer knows in some way the
+call never will return. The context of the ’return address’ might be on a
+
diff erent line, in a
diff erent lexical block, or past the end of the calling
+subroutine. If a consumer were to assume that it was in the same context as the
+calling address, the virtual unwind might fail.*
+
+*For architectures with constant-length instructions where the return address
+immediately follows the call instruction, a simple solution is to subtract the
+length of an instruction from the return address to obtain the calling
+instruction. For architectures with variable-length instructions (for example,
+x86), this is not possible. However, subtracting 1 from the return address,
+although not guaranteed to provide the exact calling address, generally will
+produce an address within the same context as the calling address, and that
+usually is sufficient.*
+
+.. note::
+
+ For AMDGPU the instructions are variable size and a consumer can subtract 1
+ from the return address to get the address of a byte within the call site
+ instructions.
+
+Call Frame Information Instruction Encodings
+++++++++++++++++++++++++++++++++++++++++++++
+
+The following table gives the encoding of the DWARF call frame information
+instructions added for AMDGPU.
+
+.. table:: AMDGPU DWARF Call Frame Information Instruction Encodings
+ :name: amdgpu-dwarf-call-frame-information-instruction-encodings-table
+
+ =================================== ==== ==== ============== ================
+ Instruction High Low Operand 1 Operand 1
+ 2 6
+ Bits Bits
+ =================================== ==== ==== ============== ================
+ DW_CFA_LLVM_def_cfa_aspace 0 0Xxx ULEB128
+ =================================== ==== ==== ============== ================
+
+Line Table
+~~~~~~~~~~
+
+.. note::
+
+ AMDGPU does not use the ``isa`` state machine registers and always sets it to
+ 0.
+
+.. TODO::
+
+ Should the ``isa`` state machine register be used to indicate if the code is
+ in wave32 or wave64 mode? Or used to specify the architecture ISA?
+
+Accelerated Access
+~~~~~~~~~~~~~~~~~~
+
+Lookup By Name
+++++++++++++++
+
+.. note::
+
+ For AMDGPU:
+
+ * The rule for debugger information entries included in the name
+ index in the optional ``.debug_names`` section is extended to also include
+ named ``DW_TAG_variable`` debugging information entries with a
+ ``DW_AT_location`` attribute that includes a
+ ``DW_OP_LLVM_form_aspace_address`` operation.
+
+ * The lookup by name section header ``augmentation_string`` string field contains:
+
+ ::
+
+ [amd:v0.0]
+
+ The "vX.Y" specifies the major X and minor Y version number of the AMDGPU
+ extensions used in the DWARF of the compilation unit. The version number
+ conforms to [SEMVER]_.
+
+Lookup By Address
++++++++++++++++++
+
+.. note::
+
+ For AMDGPU:
+
+ * The lookup by address section header table:
+
+ ``address_size`` (ubyte)
+ Match the address size for the ``Global`` address space defined in
+ :ref:`amdgpu-dwarf-address-space-mapping-table`.
+
+ ``segment_selector_size`` (ubyte)
+ AMDGPU does not use a segment selector so this is 0. The entries in the
+ ``.debug_aranges`` do not have a segment selector.
+
+Data Representation
+~~~~~~~~~~~~~~~~~~~
+
+.. _amdgpu-dwarf-32-bit-and-64-bit-dwarf-formats:
+
+32-Bit and 64-Bit DWARF Formats
++++++++++++++++++++++++++++++++
+
+.. note::
+
+ For AMDGPU:
+
+ * For the ``amdgcn`` target only 64-bit process address space is supported
+ * The producer can generate either 32-bit or 64-bit DWARF format.
+
+1. Within the body of the ``.debug_info`` section, certain forms of attribute
+ value depend on the choice of DWARF format as follows. For the 32-bit DWARF
+ format, the value is a 4-byte unsigned integer; for the 64-bit DWARF format,
+ the value is an 8-byte unsigned integer.
+
+ .. table:: AMDGPU DWARF ``.debug_info`` section attribute sizes
+ :name: amdgpu-dwarf-debug-info-section-attribute-sizes
+
+ =================================== =====================================
+ Form Role
+ =================================== =====================================
+ DW_FORM_line_strp offset in ``.debug_line_str``
+ DW_FORM_ref_addr offset in ``.debug_info``
+ DW_FORM_sec_offset offset in a section other than
+ ``.debug_info`` or ``.debug_str``
+ DW_FORM_strp offset in ``.debug_str``
+ DW_FORM_strp_sup offset in ``.debug_str`` section of
+ supplementary object file
+ DW_OP_call_ref offset in ``.debug_info``
+ DW_OP_implicit_pointer offset in ``.debug_info``
+ DW_OP_LLVM_aspace_implicit_pointer offset in ``.debug_info``
+ =================================== =====================================
+
+Unit Headers
+++++++++++++
+
+.. note::
+
+ For AMDGPU:
+
+ * For AMDGPU the ``address_size`` field of the DWARF unit headers matches the
+ address size for the ``Global`` address space defined in
+ :ref:`amdgpu-dwarf-address-space-mapping-table`.
+
+.. _amdgpu-dwarf-amdgpu-dw-at-llvm-lane-pc:
+
+AMDGPU DW_AT_LLVM_lane_pc
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``DW_AT_LLVM_lane_pc`` attribute can be used to specify the program location
+of the separate lanes of a SIMT thread. See
+:ref:`amdgpu-dwarf-debugging-information-entry-attributes`.
+
+If the lane is an active lane then this will be the same as the current program
+location.
+
+If the lane is inactive, but was active on entry to the subprogram, then this is
+the program location in the subprogram at which execution of the lane is
+conceptual positioned.
+
+If the lane was not active on entry to the subprogram, then this will be the
+undefined location. A client debugger can check if the lane is part of a valid
+work-group by checking that the lane is in the range of the associated
+work-group within the grid, accounting for partial work-groups. If it is not
+then the debugger can omit any information for the lane. Otherwise, the debugger
+may repeatedly unwind the stack and inspect the ``DW_AT_LLVM_lane_pc`` of the
+calling subprogram until it finds a non-undefined location. Conceptually the
+lane only has the call frames that it has a non-undefined
+``DW_AT_LLVM_lane_pc``.
+
+The following example illustrates how the AMDGPU backend can generate a location
+list for the nested ``IF/THEN/ELSE`` structures of the following subprogram
+pseudo code for a target with 64 lanes per wave.
+
+.. code::
+ :number-lines:
+
+ SUBPROGRAM X
+ BEGIN
+ a;
+ IF (c1) THEN
+ b;
+ IF (c2) THEN
+ c;
+ ELSE
+ d;
+ ENDIF
+ e;
+ ELSE
+ f;
+ ENDIF
+ g;
+ END
+
+The AMDGPU backend may generate the following pseudo LLVM MIR to manipulate the
+execution mask (``EXEC``) to linearized the control flow. The condition is
+evaluated to make a mask of the lanes for which the condition evaluates to true.
+First the ``THEN`` region is executed by setting the ``EXEC`` mask to the
+logical ``AND`` of the current ``EXEC`` mask with the condition mask. Then the
+``ELSE`` region is executed by negating the ``EXEC`` mask and logical ``AND`` of
+the saved ``EXEC`` mask at the start of the region. After the ``IF/THEN/ELSE``
+region the ``EXEC`` mask is restored to the value it had at the beginning of the
+region. This is shown below. Other approaches are possible, but the basic
+concept is the same.
+
+.. code::
+ :number-lines:
+
+ $lex_start:
+ a;
+ %1 = EXEC
+ %2 = c1
+ $lex_1_start:
+ EXEC = %1 & %2
+ $if_1_then:
+ b;
+ %3 = EXEC
+ %4 = c2
+ $lex_1_1_start:
+ EXEC = %3 & %4
+ $lex_1_1_then:
+ c;
+ EXEC = ~EXEC & %3
+ $lex_1_1_else:
+ d;
+ EXEC = %3
+ $lex_1_1_end:
+ e;
+ EXEC = ~EXEC & %1
+ $lex_1_else:
+ f;
+ EXEC = %1
+ $lex_1_end:
+ g;
+ $lex_end:
+
+To create the location list that defines the location description of a vector of
+lane program locations, the LLVM MIR ``DBG_VALUE`` pseudo instruction can be
+used to annotate the linearized control flow. This can be done by defining an
+artificial variable for the lane PC. The location list created for it is used to
+define the value of the ``DW_AT_LLVM_lane_pc`` attribute.
+
+A DWARF procedure is defined for each well nested structured control flow region
+which provides the conceptual lane program location for a lane if it is not
+active (namely it is divergent). The expression for each region inherits the
+value of the immediately enclosing region and modifies it according to the
+semantics of the region.
+
+For an ``IF/THEN/ELSE`` region the divergent program location is at the start of
+the region for the ``THEN`` region since it is executed first. For the ``ELSE``
+region the divergent program location is at the end of the ``IF/THEN/ELSE``
+region since the ``THEN`` region has completed.
+
+The lane PC artificial variable is assigned at each region transition. It uses
+the immediately enclosing region's DWARF procedure to compute the program
+location for each lane assuming they are divergent, and then modifies the result
+by inserting the current program location for each lane that the ``EXEC`` mask
+indicates is active.
+
+By having separate DWARF procedures for each region, they can be reused to
+define the value for any nested region. This reduces the amount of DWARF
+required.
+
+The following provides an example using pseudo LLVM MIR.
+
+.. code::
+ :number-lines:
+
+ $lex_start:
+ DEFINE_DWARF %__uint_64 = DW_TAG_base_type[
+ DW_AT_name = "__uint64";
+ DW_AT_byte_size = 8;
+ DW_AT_encoding = DW_ATE_unsigned;
+ ];
+ DEFINE_DWARF %__active_lane_pc = DW_TAG_dwarf_procedure[
+ DW_AT_name = "__active_lane_pc";
+ DW_AT_location = [
+ DW_OP_regx PC;
+ DW_OP_LLVM_extend 64, 64;
+ DW_OP_regval_type EXEC, %uint_64;
+ DW_OP_LLVM_select_bit_piece 64, 64;
+ ];
+ ];
+ DEFINE_DWARF %__divergent_lane_pc = DW_TAG_dwarf_procedure[
+ DW_AT_name = "__divergent_lane_pc";
+ DW_AT_location = [
+ DW_OP_LLVM_undefined;
+ DW_OP_LLVM_extend 64, 64;
+ ];
+ ];
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc, DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ a;
+ %1 = EXEC;
+ DBG_VALUE %1, $noreg, %__lex_1_save_exec;
+ %2 = c1;
+ $lex_1_start:
+ EXEC = %1 & %2;
+ $lex_1_then:
+ DEFINE_DWARF %__divergent_lane_pc_1_then = DW_TAG_dwarf_procedure[
+ DW_AT_name = "__divergent_lane_pc_1_then";
+ DW_AT_location = DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc;
+ DW_OP_xaddr &lex_1_start;
+ DW_OP_stack_value;
+ DW_OP_LLVM_extend 64, 64;
+ DW_OP_call_ref %__lex_1_save_exec;
+ DW_OP_deref_type 64, %__uint_64;
+ DW_OP_LLVM_select_bit_piece 64, 64;
+ ];
+ ];
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc, DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc_1_then;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ b;
+ %3 = EXEC;
+ DBG_VALUE %3, %__lex_1_1_save_exec;
+ %4 = c2;
+ $lex_1_1_start:
+ EXEC = %3 & %4;
+ $lex_1_1_then:
+ DEFINE_DWARF %__divergent_lane_pc_1_1_then = DW_TAG_dwarf_procedure[
+ DW_AT_name = "__divergent_lane_pc_1_1_then";
+ DW_AT_location = DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc_1_then;
+ DW_OP_xaddr &lex_1_1_start;
+ DW_OP_stack_value;
+ DW_OP_LLVM_extend 64, 64;
+ DW_OP_call_ref %__lex_1_1_save_exec;
+ DW_OP_deref_type 64, %__uint_64;
+ DW_OP_LLVM_select_bit_piece 64, 64;
+ ];
+ ];
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc, DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc_1_1_then;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ c;
+ EXEC = ~EXEC & %3;
+ $lex_1_1_else:
+ DEFINE_DWARF %__divergent_lane_pc_1_1_else = DW_TAG_dwarf_procedure[
+ DW_AT_name = "__divergent_lane_pc_1_1_else";
+ DW_AT_location = DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc_1_then;
+ DW_OP_xaddr &lex_1_1_end;
+ DW_OP_stack_value;
+ DW_OP_LLVM_extend 64, 64;
+ DW_OP_call_ref %__lex_1_1_save_exec;
+ DW_OP_deref_type 64, %__uint_64;
+ DW_OP_LLVM_select_bit_piece 64, 64;
+ ];
+ ];
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc, DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc_1_1_else;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ d;
+ EXEC = %3;
+ $lex_1_1_end:
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc, DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ e;
+ EXEC = ~EXEC & %1;
+ $lex_1_else:
+ DEFINE_DWARF %__divergent_lane_pc_1_else = DW_TAG_dwarf_procedure[
+ DW_AT_name = "__divergent_lane_pc_1_else";
+ DW_AT_location = DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc;
+ DW_OP_xaddr &lex_1_end;
+ DW_OP_stack_value;
+ DW_OP_LLVM_extend 64, 64;
+ DW_OP_call_ref %__lex_1_save_exec;
+ DW_OP_deref_type 64, %__uint_64;
+ DW_OP_LLVM_select_bit_piece 64, 64;
+ ];
+ ];
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc, DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc_1_else;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ f;
+ EXEC = %1;
+ $lex_1_end:
+ DBG_VALUE $noreg, $noreg, %DW_AT_LLVM_lane_pc DIExpression[
+ DW_OP_call_ref %__divergent_lane_pc;
+ DW_OP_call_ref %__active_lane_pc;
+ ];
+ g;
+ $lex_end:
+
+The DWARF procedure ``%__active_lane_pc`` is used to update the lane pc elements
+that are active with the current program location.
+
+Artificial variables %__lex_1_save_exec and %__lex_1_1_save_exec are created for
+the execution masks saved on entry to a region. Using the ``DBG_VALUE`` pseudo
+instruction, location lists that describes where they are allocated at any given
+program location will be created. The compiler may allocate them to registers,
+or spill them to memory.
+
+The DWARF procedures for each region use saved execution mask value to only
+update the lanes that are active on entry to the region. All other lanes retain
+the value of the enclosing region where they were last active. If they were not
+active on entry to the subprogram, then will have the undefined location
+description.
+
+Other structured control flow regions can be handled similarly. For example,
+loops would set the divergent program location for the region at the end of the
+loop. Any lanes active will be in the loop, and any lanes not active must have
+exited the loop.
+
+An ``IF/THEN/ELSEIF/ELSEIF/...`` region can be treated as a nest of
+``IF/THEN/ELSE`` regions.
+
+The DWARF procedures can use the active lane artificial variable described in
+:ref:`amdgpu-dwarf-amdgpu-dw-at-llvm-active-lane` rather than the actual
+``EXEC`` mask in order to support whole or quad wave mode.
+
+.. _amdgpu-dwarf-amdgpu-dw-at-llvm-active-lane:
+
+AMDGPU DW_AT_LLVM_active_lane
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``DW_AT_LLVM_active_lane`` attribute can be used to specify the lanes that
+are conceptually active for a SIMT thread. See
+:ref:`amdgpu-dwarf-debugging-information-entry-attributes`.
+
+The execution mask may be modified to implement whole or quad wave mode
+operations. For example, all lanes may need to temporarily be made active to
+execute a whole wave operation. Such regions would save the ``EXEC`` mask,
+update it to enable the necessary lanes, perform the operations, and then
+restore the ``EXEC`` mask from the saved value. While executing the whole wave
+region, the conceptual execution mask is the saved value, not the ``EXEC``
+value.
+
+This is handled by defining an artificial variable for the active lane mask. The
+active lane mask artificial variable would be the actual ``EXEC`` mask for
+normal regions, and the saved execution mask for regions where the mask is
+temporarily updated. The location list created for this artificial variable is
+used to define the value of the ``DW_AT_LLVM_active_lane`` attribute.
Source Text
~~~~~~~~~~~
@@ -6571,10 +9499,12 @@ Additional Documentation
.. [AMD-ROCm] `ROCm: Open Platform for Development, Discovery and Education Around GPU Computing <http://gpuopen.com/compute-product/rocm/>`__
.. [AMD-ROCm-github] `ROCm github <http://github.com/RadeonOpenCompute>`__
.. [HSA] `Heterogeneous System Architecture (HSA) Foundation <http://www.hsafoundation.com/>`__
+.. [HIP] `HIP Programming Guide <https://rocm-documentation.readthedocs.io/en/latest/Programming_Guides/Programming-Guides.html#hip-programing-guide>`__
.. [ELF] `Executable and Linkable Format (ELF) <http://www.sco.com/developers/gabi/>`__
.. [DWARF] `DWARF Debugging Information Format <http://dwarfstd.org/>`__
.. [YAML] `YAML Ain't Markup Language (YAML™) Version 1.2 <http://www.yaml.org/spec/1.2/spec.html>`__
.. [MsgPack] `Message Pack <http://www.msgpack.org/>`__
+.. [SEMVER] `Semantic Versioning <https://semver.org/>`__
.. [OpenCL] `The OpenCL Specification Version 2.0 <http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf>`__
.. [HRF] `Heterogeneous-race-free Memory Models <http://benedictgaster.org/wp-content/uploads/2014/01/asplos269-FINAL.pdf>`__
.. [CLANG-ATTR] `Attributes in Clang <http://clang.llvm.org/docs/AttributeReference.html>`__
More information about the llvm-commits
mailing list