[llvm] [LangRef] Rework DIExpression docs (PR #153072)
Scott Linder via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 20 10:58:18 PDT 2025
================
@@ -400,6 +404,189 @@ This intrinsic is equivalent to ``#dbg_assign``:
metadata i32 %i, metadata !1, metadata !DIExpression(), metadata !2,
metadata ptr %i.addr, metadata !DIExpression(), metadata !3), !dbg !3
+.. _diexpression:
+
+DIExpression
+------------
+
+Debug expressions are represented as :ref:`specialized-metadata`.
+
+Debug expressions are interpreted left-to-right: start by pushing the
+value/address operand of the record onto a stack, then repeatedly push and
+evaluate opcodes from the DIExpression until the final variable description is
+produced.
+
+The opcodes available in these expressions are described in
+:ref:`dwarf-opcodes` and :ref:`internal-opcodes`.
+
+DWARF specifies three kinds of simple location descriptions: Register, memory,
+and implicit location descriptions. Note that a location description is
+defined over certain ranges of a program, i.e the location of a variable may
+change over the course of the program. Register and memory location
+descriptions describe the *concrete location* of a source variable (in the
+sense that a debugger might modify its value), whereas *implicit locations*
+describe merely the actual *value* of a source variable which might not exist
+in registers or in memory (see ``DW_OP_stack_value``).
+
+A ``#dbg_declare`` record describes an indirect value (the address) of a
+source variable. The first operand of the record must be an address of some
+kind. A DIExpression operand to the record refines this address to produce a
+concrete location for the source variable.
+
+A ``#dbg_value`` record describes the direct value of a source variable.
+The first operand of the record may be a direct or indirect value. A
+DIExpression operand to the record refines the first operand to produce a
+direct value. For example, if the first operand is an indirect value, it may be
+necessary to insert ``DW_OP_deref`` into the DIExpression in order to produce a
+valid debug record.
+
+.. note::
+
+ A DIExpression is interpreted in the same way regardless of which kind of
+ debug record it's attached to.
+
+ DIExpressions are always printed and parsed inline; they can never be
+ referenced by an ID (e.g. ``!1``).
+
+Examples using ``DW_OP_LLVM_implicit_pointer``:
+
+.. code-block:: text
+
+ IR for "*ptr = 4;"
+ --------------
+ #dbg_value(i32 4, !17, !DIExpression(DW_OP_LLVM_implicit_pointer), !20)
+ !17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
+ type: !18)
+ !18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
+ !19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+ !20 = !DILocation(line: 10, scope: !12)
+
+ IR for "**ptr = 4;"
+ --------------
+ #dbg_value(i32 4, !17,
+ !DIExpression(DW_OP_LLVM_implicit_pointer, DW_OP_LLVM_implicit_pointer),
+ !21)
+ !17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,
+ type: !18)
+ !18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)
+ !19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)
+ !20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+ !21 = !DILocation(line: 10, scope: !12)
+
+
+.. _dwarf-opcodes:
+
+DWARF Opcodes
+^^^^^^^^^^^^^
+
+When possible LLVM reuses DWARF opcodes and gives them identical semantics in
+LLVM expressions as in DWARF expressions. The current supported opcode
+vocabulary is limited, but includes at least:
+
+- ``DW_OP_deref`` dereferences the top of the expression stack.
+- ``DW_OP_plus`` pops the last two entries from the expression stack, adds
+ them together and appends the result to the expression stack.
+- ``DW_OP_minus`` pops the last two entries from the expression stack, subtracts
+ the last entry from the second last entry and appends the result to the
+ expression stack.
+- ``DW_OP_plus_uconst, 93`` adds ``93`` to the working expression.
+- ``DW_OP_swap`` swaps top two stack entries.
+- ``DW_OP_xderef`` provides extended dereference mechanism. The entry at the top
+ of the stack is treated as an address. The second stack entry is treated as an
+ address space identifier.
+- ``DW_OP_stack_value`` marks a constant value.
+- ``DW_OP_breg`` (or ``DW_OP_bregx``) represents a content on the provided
+ signed offset of the specified register. The opcode is only generated by the
+ ``AsmPrinter`` pass to describe call site parameter value which requires an
+ expression over two registers.
+- ``DW_OP_push_object_address`` pushes the address of the object which can then
+ serve as a descriptor in subsequent calculation. This opcode can be used to
+ calculate bounds of fortran allocatable array which has array descriptors.
+- ``DW_OP_over`` duplicates the entry currently second in the stack at the top
+ of the stack. This opcode can be used to calculate bounds of fortran assumed
+ rank array which has rank known at run time and current dimension number is
+ implicitly first element of the stack.
+
+.. _internal-opcodes:
+
+Internal Opcodes
+^^^^^^^^^^^^^^^^
+
+Where the DWARF equivalent is not suitable, or no DWARF equivalent exists, LLVM
+defines internal-only opcodes which have no direct analog in DWARF.
+
+.. note::
+
+ Some opcodes do not influence the final DWARF expression directly, instead
+ encoding information logically belonging to the debug records which use
+ them.
+
+- ``DW_OP_LLVM_fragment, 16, 8`` specifies the offset and size (``16`` and
+ ``8`` here, respectively) of the variable fragment from the working
+ expression. Note that contrary to DW_OP_bit_piece, the offset is describing
+ the location within the described source variable. This does not affect the
+ semantics of the expression.
+- ``DW_OP_LLVM_convert, 16, DW_ATE_signed`` specifies a bit size and encoding
+ (``16`` and ``DW_ATE_signed`` here, respectively) to which the top of the
+ expression stack is to be converted. Maps into a ``DW_OP_convert`` operation
+ that references a base type constructed from the supplied values.
+- ``DW_OP_LLVM_tag_offset, tag_offset`` specifies that a memory tag should be
+ optionally applied to the pointer. The memory tag is derived from the
+ given tag offset in an implementation-defined manner. This does not affect
+ the semantics of the expression.
+- ``DW_OP_LLVM_entry_value, N`` refers to the value a register had upon
+ function entry. When targeting DWARF, a ``DBG_VALUE(reg, ...,
+ DIExpression(DW_OP_LLVM_entry_value, 1, ...)`` is lowered to
+ ``DW_OP_entry_value [reg], ...``, which pushes the value ``reg`` had upon
+ function entry onto the DWARF expression stack.
+
+ The next ``(N - 1)`` operations will be part of the ``DW_OP_entry_value``
+ block argument. For example, ``!DIExpression(DW_OP_LLVM_entry_value, 1,
+ DW_OP_plus_uconst, 123, DW_OP_stack_value)`` specifies an expression where
+ the entry value of ``reg`` is pushed onto the stack, and is added with 123.
----------------
slinder1 wrote:
Whoops, I forgot the part where I have to actually push the ref for review 🙃
The change is up now, let me know what you think!
https://github.com/llvm/llvm-project/pull/153072
More information about the llvm-commits
mailing list