[llvm] [LangRef] Rework DIExpression docs (PR #153072)

Tony Tye via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 20 17:54:53 PDT 2025


================
@@ -394,12 +398,231 @@ This intrinsic is equivalent to ``#dbg_assign``:
 
 .. code-block:: llvm
 
-      #dbg_assign(i32 %i, !1, !DIExpression(), !2, 
+      #dbg_assign(i32 %i, !1, !DIExpression(), !2,
                   ptr %i.addr, !DIExpression(), !3)
     call void @llvm.dbg.assign(
       metadata i32 %i, metadata !1, metadata !DIExpression(), metadata !2,
       metadata ptr %i.addr, metadata !DIExpression(), metadata !3), !dbg !3
 
+.. _diexpression:
+
+DIExpression
+------------
+
+Debug expressions are represented as :ref:`specialized-metadata`.
+
+Debug expressions are interpreted left-to-right: start by pushing the
+value/address operand of the record onto a stack, then repeatedly push and
+evaluate opcodes from the DIExpression until the final variable description is
+produced.
+
+The opcodes available in these expressions are described in
+:ref:`dwarf-opcodes` and :ref:`internal-opcodes`.
+
+DWARF specifies three kinds of simple location descriptions: Register, memory,
+and implicit location descriptions.  Note that a location description is
+defined over certain ranges of a program, i.e the location of a variable may
+change over the course of the program. Register and memory location
+descriptions describe the *concrete location* of a source variable (in the
+sense that a debugger might modify its value), whereas *implicit locations*
+describe merely the actual *value* of a source variable which might not exist
+in registers or in memory (see ``DW_OP_stack_value``).
+
+A ``#dbg_declare`` record describes an indirect value (the address) of a
+source variable. The first operand of the record must be an address of some
+kind. A DIExpression operand to the record refines this address to produce a
+concrete location for the source variable.
+
+A ``#dbg_value`` record describes the direct value of a source variable.
+The first operand of the record may be a direct or indirect value. A
+DIExpression operand to the record refines the first operand to produce a
+direct value. For example, if the first operand is an indirect value, it may be
+necessary to insert ``DW_OP_deref`` into the DIExpression in order to produce a
+valid debug record.
+
+.. note::
+
+   A DIExpression is interpreted in the same way regardless of which kind of
+   debug record it's attached to.
+
+   DIExpressions are always printed and parsed inline; they can never be
+   referenced by an ID (e.g. ``!1``).
+
+.. _dwarf-opcodes:
+
+DWARF Opcodes
+^^^^^^^^^^^^^
+
+When possible LLVM reuses DWARF opcodes and gives them identical semantics in
+LLVM expressions as in DWARF expressions. The current supported opcode
+vocabulary is limited, but includes at least:
+
+- ``DW_OP_deref`` dereferences the top of the expression stack.
+- ``DW_OP_plus`` pops the last two entries from the expression stack, adds
+  them together and pushes the result to the expression stack.
+- ``DW_OP_minus`` pops the last two entries from the expression stack, subtracts
+  the last entry from the second last entry and appends the result to the
+  expression stack.
+- ``DW_OP_plus_uconst, 93`` adds ``93`` to the value on top of the stack.
+- ``DW_OP_swap`` swaps top two stack entries.
+- ``DW_OP_xderef`` provides extended dereference mechanism. The entry at the top
+  of the stack is treated as an address. The second stack entry is treated as an
+  address space identifier. The two entries are popped and then an
+  implementation defined value is pushed on the stack.
+- ``DW_OP_stack_value`` may appear at most once in an expression, and must be
+  the last opcode if ``DW_OP_LLVM_fragment`` is not present, or the second last
+  opcode if ``DW_OP_LLVM_fragment`` is present. It pops the top value of the
+  expression stack and makes an implicit value location with that value.
+- ``DW_OP_breg`` (or ``DW_OP_bregx``) represents a content on the provided
+  signed offset of the specified register. The opcode is only generated by the
+  ``AsmPrinter`` pass to describe call site parameter value which requires an
+  expression over two registers.
+- ``DW_OP_push_object_address`` pushes the address of the object which can then
+  serve as a descriptor in subsequent calculation. This opcode can be used to
+  calculate bounds of fortran allocatable array which has array descriptors.
+- ``DW_OP_over`` duplicates the entry currently second in the stack at the top
+  of the stack. This opcode can be used to calculate bounds of fortran assumed
+  rank array which has rank known at run time and current dimension number is
+  implicitly first element of the stack.
+
+.. _internal-opcodes:
+
+Internal Opcodes
+^^^^^^^^^^^^^^^^
+
+Where the DWARF equivalent is not suitable, or no DWARF equivalent exists, LLVM
+defines internal-only opcodes which have no direct analog in DWARF.
+
+.. note::
+
+   Some opcodes do not influence the final DWARF expression directly, instead
+   encoding information logically belonging to the debug records which use
+   them.
+
+- ``DW_OP_LLVM_fragment, <offset>, <size>`` may appear at most once in an
+  expression, and must be the last opcode. It specifies the bit offset and bit
+  size of the variable fragment being described by the record or intrinsic
+  using the expression. Note that contrary to DW_OP_bit_piece, the offset is
+  describing the location within the described source variable. At DWARF
+  generation time all fragments for the same variable are collected together
+  and DWARF DW_OP_piece and DW_OP_bit_piece opcodes are used to describe a
+  composite with pieces corresponding to the fragments. (This does not affect
+  the semantics of the expression containing it.)
+- ``DW_OP_LLVM_convert, 16, DW_ATE_signed`` specifies a bit size and encoding
+  (``16`` and ``DW_ATE_signed`` here, respectively) to which the top of the
+  expression stack is to be converted. Maps into a ``DW_OP_convert`` operation
+  that references a base type constructed from the supplied values.
+- ``DW_OP_LLVM_tag_offset, tag_offset`` specifies that a memory tag should be
+  optionally applied to the pointer. The memory tag is derived from the given
+  tag offset in an implementation-defined manner. (This does not affect the
+  semantics of the expression containing it.)
+- ``DW_OP_LLVM_entry_value, N`` evaluates a sub-expression as-if it were
+  evaluated upon entry to the current call frame.
+
+  The sub-expression replaces the operations which comprise it, i.e. all such
+  operations are evaluated only in the frame entry context.
+
+  The sub-expression begins with the operation which immediately precedes
+  ``DW_OP_LLVM_entry_value, N`` in the ``DIExpression``. If no such operation
+  exists (i.e. the expression begins with ``DW_OP_LLVM_entry_value, N``), the
+  implicit operation which pushes the first debug argument of the containing
+  marker/pseudo is used instead. The value ``N`` must always be at least ``1``,
+  as this first operation cannot be omitted and is counted in ``N``.
+
+  The rest of the sub-expression comprises the ``(N - 1)`` operations following
+  ``DW_OP_LLVM_entry_value, N`` in the ``DIExpression``.
+
+  Due to framework limitations:
+
+    - ``N`` must not be greater than ``1``. In other words, ``N`` must equal
+      ``1``, and the sub-expression comprises only the operation immediately
+      preceding ``DW_OP_LLVM_entry_value, N``.
+    - ``DW_OP_LLVM_entry_value, N`` must be either the first operation of a
+      ``DIExpression`` or the second operation if the expression begins with
+      ``DW_OP_LLVM_arg, 0``.
+    - The first operation must refer to a register value.
+
+  Taken together, these limitations mean that ``DW_OP_LLVM_entry_value`` can
+  only currently be used to push the value a single register had on entry to
+  the current stack frame.
+
+  For example, ``!DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_entry_value, 1,
+  DW_OP_LLVM_arg, 1, DW_OP_plus, DW_OP_stack_value)`` specifies an expression
+  where the entry value of the first argument to the ``DIExpression`` is added
+  to the non-entry value of the second argument, and the result is used as the
+  value for an implicit value location.
+
+  When targeting DWARF, a ``DBG_VALUE(reg, ...,
+  DIExpression(DW_OP_LLVM_entry_value, 1, ...)`` is lowered to
+  ``DW_OP_entry_value [reg], ...``, which pushes the value ``reg`` had upon
+  frame entry onto the DWARF expression stack.
+
+  Because ``DW_OP_LLVM_entry_value`` is currently limited registers, it is
----------------
t-tye wrote:

limited registers -> limited to registers

https://github.com/llvm/llvm-project/pull/153072


More information about the llvm-commits mailing list