[llvm] [llvm] Proofread SourceLevelDebugging.rst (PR #152838)
Kazu Hirata via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 8 23:16:31 PDT 2025
https://github.com/kazutakahirata created https://github.com/llvm/llvm-project/pull/152838
None
>From 5486aa1bc6b6821224e69ca2eae38be2ebe3e3f1 Mon Sep 17 00:00:00 2001
From: Kazu Hirata <kazu at google.com>
Date: Fri, 8 Aug 2025 08:18:29 -0700
Subject: [PATCH] [llvm] Proofread SourceLevelDebugging.rst
---
llvm/docs/SourceLevelDebugging.rst | 98 +++++++++++++++---------------
1 file changed, 49 insertions(+), 49 deletions(-)
diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst
index c2084c2bf02d6..4ca0ee4757255 100644
--- a/llvm/docs/SourceLevelDebugging.rst
+++ b/llvm/docs/SourceLevelDebugging.rst
@@ -34,7 +34,7 @@ important ones are:
the source-level-language.
* Source-level languages are often **widely** different from one another.
- LLVM should not put any restrictions of the flavor of the source-language,
+ LLVM should not put any restrictions on the flavor of the source-language,
and the debugging information should work with any language.
* With code generator support, it should be possible to use an LLVM compiler
@@ -74,10 +74,10 @@ from and inspired by DWARF, but it is feasible to translate into other target
debug info formats such as STABS.
SamplePGO (also known as `AutoFDO <https://gcc.gnu.org/wiki/AutoFDO>`_)
-is a variant of profile guided optimizations which uses hardware sampling based
+is a variant of profile-guided optimizations which uses hardware sampling based
profilers to collect branch frequency data with low overhead in production
environments. It relies on debug information to associate profile information
-to LLVM IR which is then used to guide optimization heuristics. Maintaining
+with LLVM IR which is then used to guide optimization heuristics. Maintaining
deterministic and distinct source locations is necessary to maximize the
accuracy of mapping hardware sample counts to LLVM IR. For example, DWARF
`discriminators <https://wiki.dwarfstd.org/Path_Discriminators.md>`_ allow
@@ -334,7 +334,7 @@ performs the assignment, and the destination address.
The first three arguments are the same as for a ``#dbg_value``. The fourth
argument is a ``DIAssignID`` used to reference a store. The fifth is the
destination of the store, the sixth is a `complex
-expression <LangRef.html#diexpression>`_ that modfies it, and the seventh is a
+expression <LangRef.html#diexpression>`_ that modifies it, and the seventh is a
`source location <LangRef.html#dilocation>`_.
See :doc:`AssignmentTracking` for more info.
@@ -512,7 +512,7 @@ Here ``!13`` is metadata providing `location information
information parameter to the records indicates that the variable ``X`` is
declared at line number 2 at a function level scope in function ``foo``.
-Now lets take another example.
+Now, let's take another example.
.. code-block:: llvm
@@ -532,14 +532,14 @@ Here ``!18`` indicates that ``Z`` is declared at line number 5 and column
number 11 inside of lexical scope ``!17``. The lexical scope itself resides
inside of subprogram ``!4`` described above.
-The scope information attached with each instruction provides a straightforward
+The scope information attached to each instruction provides a straightforward
way to find instructions covered by a scope.
Object lifetime in optimized code
=================================
In the example above, every variable assignment uniquely corresponds to a
-memory store to the variable's position on the stack. However in heavily
+memory store to the variable's position on the stack. However, in heavily
optimized code LLVM promotes most variables into SSA values, which can
eventually be placed in physical registers or memory locations. To track SSA
values through compilation, when objects are promoted to SSA values a
@@ -628,7 +628,7 @@ perhaps, be optimized into the following code:
}
What ``#dbg_value`` records should be placed to represent the original variable
-locations in this code? Unfortunately the second, third and fourth
+locations in this code? Unfortunately the second, third, and fourth
#dbg_values for ``!1`` in the source function have had their operands
(%tval, %fval, %merge) optimized out. Assuming we cannot recover them, we
might consider this placement of #dbg_values:
@@ -696,7 +696,7 @@ How variable location metadata is transformed during CodeGen
LLVM preserves debug information throughout mid-level and backend passes,
ultimately producing a mapping between source-level information and
instruction ranges. This
-is relatively straightforwards for line number information, as mapping
+is relatively straightforward for line number information, as mapping
instructions to line numbers is a simple association. For variable locations
however the story is more complex. As each ``#dbg_value`` record
represents a source-level assignment of a value to a source variable, the
@@ -710,7 +710,7 @@ location fidelity are:
2. Register allocation
3. Block layout
-each of which are discussed below. In addition, instruction scheduling can
+each of which is discussed below. In addition, instruction scheduling can
significantly change the ordering of the program, and occurs in a number of
different passes.
@@ -782,13 +782,13 @@ And has the following operands:
location operands, which may take any of the same values as the first
operand of the ``DBG_VALUE`` instruction above. These variable location
operands are inserted into the final DWARF Expression in positions indicated
- by the DW_OP_LLVM_arg operator in the `DIExpression
+ by the ``DW_OP_LLVM_arg`` operator in the `DIExpression
<LangRef.html#diexpression>`_.
The position at which the DBG_VALUEs are inserted should correspond to the
positions of their matching ``#dbg_value`` records in the IR block. As
with optimization, LLVM aims to preserve the order in which variable
-assignments occurred in the source program. However SelectionDAG performs some
+assignments occurred in the source program. However, SelectionDAG performs some
instruction scheduling, which can reorder assignments (discussed below).
Function parameter locations are moved to the beginning of the function if
they're not already, to ensure they're immediately available on function entry.
@@ -855,19 +855,19 @@ If one compiles this IR with ``llc -o - -start-after=codegen-prepare -stop-after
$eax = COPY %8, debug-location !5
RET 0, $eax, debug-location !5
-Observe first that there is a DBG_VALUE instruction for every ``#dbg_value``
+Observe first that there is a ``DBG_VALUE`` instruction for every ``#dbg_value``
record in the source IR, ensuring no source level assignments go missing.
Then consider the different ways in which variable locations have been recorded:
* For the first #dbg_value an immediate operand is used to record a zero value.
-* The #dbg_value of the PHI instruction leads to a DBG_VALUE of virtual register
+* The #dbg_value of the PHI instruction leads to a ``DBG_VALUE`` of virtual register
``%0``.
* The first GEP has its effect folded into the first load instruction
(as a 4-byte offset), but the variable location is salvaged by folding
- the GEPs effect into the DIExpression.
+ the GEPs effect into the ``DIExpression``.
* The second GEP is also folded into the corresponding load. However, it is
insufficiently simple to be salvaged, and is emitted as a ``$noreg``
- DBG_VALUE, indicating that the variable takes on an undefined location.
+ ``DBG_VALUE``, indicating that the variable takes on an undefined location.
* The final #dbg_value has its Value placed in virtual register ``%1``.
Instruction Scheduling
@@ -880,7 +880,7 @@ case the instruction sequence could be completely reversed. In such
circumstances LLVM follows the principle applied to optimizations, that it is
better for the debugger not to display any state than a misleading state.
Thus, whenever instructions are advanced in order of execution, any
-corresponding DBG_VALUE is kept in its original position, and if an instruction
+corresponding ``DBG_VALUE`` is kept in its original position, and if an instruction
is delayed then the variable is given an undefined location for the duration
of the delay. To illustrate, consider this pseudo-MIR:
@@ -893,7 +893,7 @@ of the delay. To illustrate, consider this pseudo-MIR:
%7:gr32 = SUB32rr %6, %5, implicit-def dead $eflags
DBG_VALUE %7, $noreg, !5, !6
-Imagine that the SUB32rr were moved forward to give us the following MIR:
+Imagine that the ``SUB32rr`` were moved forward to give us the following MIR:
.. code-block:: text
@@ -905,13 +905,13 @@ Imagine that the SUB32rr were moved forward to give us the following MIR:
DBG_VALUE %7, $noreg, !5, !6
In this circumstance LLVM would leave the MIR as shown above. Were we to move
-the DBG_VALUE of virtual register %7 upwards with the SUB32rr, we would re-order
+the ``DBG_VALUE`` of virtual register %7 upwards with the ``SUB32rr``, we would re-order
assignments and introduce a new state of the program. Whereas with the solution
above, the debugger will see one fewer combination of variable values, because
``!3`` and ``!5`` will change value at the same time. This is preferred over
misrepresenting the original program.
-In comparison, if one sunk the MOV32rm, LLVM would produce the following:
+In comparison, if one sunk the ``MOV32rm``, LLVM would produce the following:
.. code-block:: text
@@ -924,10 +924,10 @@ In comparison, if one sunk the MOV32rm, LLVM would produce the following:
DBG_VALUE %1, $noreg, !1, !2
Here, to avoid presenting a state in which the first assignment to ``!1``
-disappears, the DBG_VALUE at the top of the block assigns the variable the
+disappears, the ``DBG_VALUE`` at the top of the block assigns the variable the
undefined location, until its value is available at the end of the block where
-an additional DBG_VALUE is added. Were any other DBG_VALUE for ``!1`` to occur
-in the instructions that the MOV32rm was sunk past, the DBG_VALUE for ``%1``
+an additional ``DBG_VALUE`` is added. Were any other ``DBG_VALUE`` for ``!1`` to occur
+in the instructions that the ``MOV32rm`` was sunk past, the ``DBG_VALUE`` for ``%1``
would be dropped and the debugger would never observe it in the variable. This
accurately reflects that the value is not available during the corresponding
portion of the original program.
@@ -937,13 +937,13 @@ Variable locations during Register Allocation
To avoid debug instructions interfering with the register allocator, the
LiveDebugVariables pass extracts variable locations from a MIR function and
-deletes the corresponding DBG_VALUE instructions. Some localized copy
+deletes the corresponding ``DBG_VALUE`` instructions. Some localized copy
propagation is performed within blocks. After register allocation, the
-VirtRegRewriter pass re-inserts DBG_VALUE instructions in their original
+VirtRegRewriter pass re-inserts ``DBG_VALUE`` instructions in their original
positions, translating virtual register references into their physical
machine locations. To avoid encoding incorrect variable locations, in this
-pass any DBG_VALUE of a virtual register that is not live, is replaced by
-the undefined location. The LiveDebugVariables may insert redundant DBG_VALUEs
+pass any ``DBG_VALUE`` of a virtual register that is not live, is replaced by
+the undefined location. The LiveDebugVariables may insert redundant ``DBG_VALUE``s
because of virtual register rewriting. These will be subsequently removed by
the RemoveRedundantDebugValues pass.
@@ -956,11 +956,11 @@ LiveDebugValues pass runs to achieve two aims:
* To propagate the location of variables through copies and register spills,
* For every block, to record every valid variable location in that block.
-After this pass the DBG_VALUE instruction changes meaning: rather than
+After this pass the ``DBG_VALUE`` instruction changes meaning: rather than
corresponding to a source-level assignment where the variable may change value,
it asserts the location of a variable in a block, and loses effect outside the
block. Propagating variable locations through copies and spills is
-straightforwards: determining the variable location in every basic block
+straightforward: determining the variable location in every basic block
requires the consideration of control flow. Consider the following IR, which
presents several difficulties:
@@ -1021,9 +1021,9 @@ predecessors then that location is propagated into the successor. If the
predecessor locations disagree, the location becomes undefined.
Once LiveDebugValues has run, every block should have all valid variable
-locations described by DBG_VALUE instructions within the block. Very little
+locations described by ``DBG_VALUE`` instructions within the block. Very little
effort is then required by supporting classes (such as
-DbgEntityHistoryCalculator) to build a map of each instruction to every
+``DbgEntityHistoryCalculator``) to build a map of each instruction to every
valid variable location, without the need to consider control flow. From
the example above, it is otherwise difficult to determine that the location
of variable ``!30`` should flow "up" into block ``%bb1``, but that the location
@@ -1057,7 +1057,7 @@ helper functions in ``lib/IR/DIBuilder.cpp``.
C/C++ source file information
-----------------------------
-``llvm::Instruction`` provides easy access to metadata attached with an
+``llvm::Instruction`` provides easy access to metadata attached to an
instruction. One can extract line number information encoded in LLVM IR using
``Instruction::getDebugLoc()`` and ``DILocation::getLine()``.
@@ -1081,7 +1081,7 @@ added by the front-end but doesn't correspond to source code written by the user
}
At the end of the scope the MyObject's destructor is called but it isn't written
-explicitly. This information is useful to avoid to have counters on brackets when
+explicitly. This information is useful to avoid having counters on brackets when
making code coverage.
C/C++ global variable information
@@ -1147,11 +1147,11 @@ a C/C++ front-end would generate the following descriptors:
!8 = !{!"clang version 4.0.0"}
-The align value in DIGlobalVariable description specifies variable alignment in
-case it was forced by C11 _Alignas(), C++11 alignas() keywords or compiler
-attribute __attribute__((aligned ())). In other case (when this field is missing)
+The align value in ``DIGlobalVariable`` description specifies variable alignment in
+case it was forced by C11 ``_Alignas()``, C++11 ``alignas()`` keywords or compiler
+attribute ``__attribute__((aligned ()))``. In other case (when this field is missing)
alignment is considered default. This is used when producing DWARF output
-for DW_AT_alignment value.
+for ``DW_AT_alignment`` value.
C/C++ function information
--------------------------
@@ -1200,7 +1200,7 @@ Given a class declaration with copy constructor declared as deleted:
foo(const foo&) = deleted;
};
-A C++ frontend would generate following:
+A C++ frontend would generate the following:
.. code-block:: text
@@ -1247,7 +1247,7 @@ and this will materialize an additional DWARF attribute as:
...
DW_AT_elemental [DW_FORM_flag_present] (true)
-There are a few DWARF tags defined to represent Fortran specific constructs i.e DW_TAG_string_type for representing Fortran character(n). In LLVM this is represented as DIStringType.
+There are a few DWARF tags defined to represent Fortran specific constructs i.e ``DW_TAG_string_type`` for representing Fortran character(n). In LLVM, this is represented as ``DIStringType``.
.. code-block:: fortran
@@ -1260,7 +1260,7 @@ a Fortran front-end would generate the following descriptors:
!DILocalVariable(name: "string", arg: 1, scope: !10, file: !3, line: 4, type: !15)
!DIStringType(name: "character(*)!2", stringLength: !16, stringLengthExpression: !DIExpression(), size: 32)
-A fortran deferred-length character can also contain the information of raw storage of the characters in addition to the length of the string. This information is encoded in the stringLocationExpression field. Based on this information, DW_AT_data_location attribute is emitted in a DW_TAG_string_type debug info.
+A fortran deferred-length character can also contain the information of raw storage of the characters in addition to the length of the string. This information is encoded in the stringLocationExpression field. Based on this information, ``DW_AT_data_location`` attribute is emitted in a ``DW_TAG_string_type`` debug info.
!DIStringType(name: "character(*)!2", stringLengthExpression: !DIExpression(), stringLocationExpression: !DIExpression(DW_OP_push_object_address, DW_OP_deref), size: 32)
@@ -1310,7 +1310,7 @@ Objective-C provides a simpler way to declare and define accessor methods using
declared properties. The language provides features to declare a property and
to let compiler synthesize accessor methods.
-The debugger lets developer inspect Objective-C interfaces and their instance
+The debugger lets developers inspect Objective-C interfaces and their instance
variables and class variables. However, the debugger does not know anything
about the properties defined in Objective-C interfaces. The debugger consumes
information generated by compiler in DWARF format. The format does not support
@@ -1397,7 +1397,7 @@ don't need to know this convention, since we are given the name of the ivar
directly.
Also, it is common practice in ObjC to have different property declarations in
-the @interface and @implementation - e.g. to provide a read-only property in
+the ``@interface`` and ``@implementation`` - e.g. to provide a read-only property in
the interface, and a read-write interface in the implementation. In that case,
the compiler should emit whichever property declaration will be in force in the
current translation unit.
@@ -1659,7 +1659,7 @@ these accesses then tell us that we didn't have a match.
Name Hash Tables
""""""""""""""""
-To solve the issues mentioned above we have structured the hash tables a bit
+To solve the issues mentioned above, we have structured the hash tables a bit
differently: a header, buckets, an array of all unique 32-bit hash values,
followed by an array of hash value data offsets, one for each hash value, then
the data for all hash values:
@@ -1707,7 +1707,7 @@ values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
| ALL HASH DATA |
`-------------------------'
-So taking the exact same data from the standard hash example above we end up
+So taking the exact same data from the standard hash example above, we end up
with:
.. code-block:: none
@@ -1798,7 +1798,7 @@ debugger lookup. If we repeat the same "``printf``" lookup from above, we
would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32-bit
hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which
is the index into the ``HASHES`` table. We would then compare any consecutive
-32-bit hashes values in the ``HASHES`` array as long as the hashes would be in
+32-bit hash values in the ``HASHES`` array as long as the hashes would be in
``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo
``n_buckets`` is still 3. In the case of a failed lookup we would access the
memory for ``BUCKETS[3]``, and then compare a few consecutive 32-bit hashes
@@ -1966,8 +1966,8 @@ array to be:
HeaderData.atoms[0].type = eAtomTypeDIEOffset;
HeaderData.atoms[0].form = DW_FORM_data4;
-This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
-encoded as a 32-bit value (DW_FORM_data4). This allows a single name to have
+This defines the contents to be the DIE offset (``eAtomTypeDIEOffset``) that is
+encoded as a 32-bit value (``DW_FORM_data4``). This allows a single name to have
multiple matching DIEs in a single file, which could come up with an inlined
function for instance. Future tables could include more information about the
DIE such as flags indicating if the DIE is a function, method, block,
@@ -1978,7 +1978,7 @@ The KeyType for the DWARF table is a 32-bit string table offset into the
may already contain copies of all of the strings. This helps make sure, with
help from the compiler, that we reuse the strings between all of the DWARF
sections and keeps the hash table size down. Another benefit to having the
-compiler generate all strings as DW_FORM_strp in the debug info, is that
+compiler generate all strings as ``DW_FORM_strp`` in the debug info, is that
DWARF parsing can be made much faster.
After a lookup is made, we get an offset into the hash data. The hash data
@@ -2114,7 +2114,7 @@ We get a few type DIEs:
AT_type( {0x00000067} ( int ) )
AT_byte_size( 0x08 )
-The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
+The ``DW_TAG_pointer_type`` is not included because it does not have a ``DW_AT_name``.
"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
If we run into a namespace that has no name this is an anonymous namespace, and
More information about the llvm-commits
mailing list