[flang-commits] [flang] [flang] Add design document for debug info generation. (PR #86939)

via flang-commits flang-commits at lists.llvm.org
Tue Apr 2 06:51:23 PDT 2024


================
@@ -0,0 +1,452 @@
+# Debug Generation
+
+Application developers spend a significant time debugging the applications that
+they create. Hence it is important that a compiler provide support for a good
+debug experience. DWARF[1] is the standard debugging file format used by
+compilers and debuggers. The LLVM infrastructure supports debug info generation
+using metadata[2]. Support for generating debug metadata is present
+in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to
+generate good debug information.
+
+We can break the work for debug generation into two separate tasks:
+1) Line Table generation
+2) Full debug generation
+The support for Fortran Debug in LLVM infrastructure[3] has made great progress
+due to many Fortran frontends adopting LLVM as the backend as well as the
+availability of the Classic Flang compiler.
+
+## Driver Flags
+By default, Flang will not generate any debug or linetable information.
+Debug information will be generated if the following flags are present.
+
+-gline-tables-only, -g1 : Emit debug line number tables only  
+-g : Emit full debug info
+
+## Line Table Generation
+
+There is existing AddDebugFoundationPass which add `FusedLoc` with a
+`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata
+for that function. However, following values are hardcoded at the moment. These
+will instead be passed from the driver.
+
+- Details of the compiler (name and version and git hash).
+- Language Standard. We can set it to Fortran95 for now and periodically
+revise it when full support for later standards is available.
+- Optimisation Level.
+- Type of debug generated (linetable/full debug).
+- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is
+the main program.
+
+`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
+match the signature of the actual function/subroutine.
+
+
+## Full Debug Generation
+
+Full debug info will include metadata to describe functions, variables and
+types. Flang will generate debug metadata in the form of MLIR attributes. These
+attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
+
+Debug metadata generation can be broken down in 2 steps.
+
+1. MLIR attributes are generated by reading information from AST or FIR. This
+step can happen anytime before or during conversion to LLVM dialect. An example
+of the metadata generated in this step is `DILocalVariableAttr` or
+`DIDerivedTypeAttr`.
+
+2. Changes that can only happen during or after conversion to LLVM dialect. The
+example of this is passing `DIGlobalVariableExpressionAttr` while
+creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp`
+that is required for local variables. It can only be created after conversion to
+LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are
+quite minimal. The bulk of the work happens in step 1.
+
+One design decision that we need to make is to decide where to perform step 1.
+Here are some possible options:
+
+**During conversion to LLVM dialect**
+
+Pros:
+1. Do step 1 and 2 in one place.
+2. No chance of missing any change introduced by an earlier transformation.
+
+Cons:
+1. Passing a lot of information from the driver as discussed in the line table
+section above may muddle interface of FIRToLLVMConversion.
+2. `DeclareOp` is removed before this pass.
+3. Even if `DeclareOp` is retained, creating debug metadata while some ops have
+been converted to LLVMdialect and others are not may cause its own issues. We
+have to walk the ops chain to extract the information which may be problematic
+in this case.
+4. Some source information is lost by this point. Examples include
+information about namelists, source line information about field of derived
+types etc.
+
+**During a pass before conversion to LLVM dialect**
+
+This is similar to what AddDebugFoundationPass is currently doing.
+
+Pros:
+1. One central location dedicated to debug information processing. This can
+result in a cleaner implementation.
+2. Similar to above, less chance of missing any change introduced by an earlier
+transformation.
+
+Cons:
+1. Step 2 still need to happen during conversion to LLVM dialect. But
+changes required for step 2 are quite minimal.
+2. Similar to above, some source information may be lost by this point.
+
+**During Lowering from AST**
+
+Pros
+1. We have better source information.
+
+Cons:
+1. There may be change in the code after lowering which may not be
+reflected in debug information.
+2. Comments on an earlier PR [5] advised against this approach.
+
+## Design
+
+The design below assumes that we are extracting the information from FIR.
+If we generate debug metadata during lowering then the description below
+may need to change. Although the generated metadata remains the same in
+both cases.
+
+The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
+information mentioned in the line info section above will be passed to it from
+the driver. This pass will run quite late in the pipeline but before
+`DecalreOp` is removed.
+
+In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp`
+and `DeclareOp` to extract the source information and build the MLIR
+attributes. A class will be added to handle conversion of MLIR and FIR types to
+`DITypeAttr`.
+
+Following sections provide details of how various language constructs will be
+handled. In these sections, the LLVM IR metadata and MLIR attributes have been
+used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute
+which gets translated to LLVM IR's `DILocalVariable`.
+
+### Variables
+
+#### Local Variables
+  In MLIR, local variables are represented by `DILocalVariableAttr` which
+  stores information like source location and type. They also require a
+  `DbgDeclareOp` which binds `DILocalVariableAttr` with a location.
+
+  In FIR, `DeclareOp` has source information about the variable. The
+  `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is
+  attached to the memref op of the `DeclareOp` using a `FusedLoc` approach.
+
+  During conversion to LLVM dialect, when an op is encountered that has a
+  `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which
+  binds the attr with its location.
+
+  The change in the IR look like as follows:
+
+```
+  original fir
+  %2 = fir.alloca i32  loc(#loc4)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+
+  Fir with FusedLoc.
+
+  %2 = fir.alloca i32  loc(#loc38)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+  #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... >
+  #loc38 = loc(fused<#di_local_variable5>[#loc4])
+
+  After conversion to llvm dialect
+
+  #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...>
+  %1 = llvm.alloca %0 x i64
+  llvm.intr.dbg.declare #di_local_variable = %1
+```
+
+#### Function Arguments
+
+Arguments work in similar way, but they present a difficulty that `DeclareOp`'s
+memref points to `BlockArgument`. Unlike the op in local variable case,
+the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily
+be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering 
+or in a separate pass.
+
+### Module
+
+In debug metadata, the Fortran module will be represented by `DIModuleAttr`.
+The variables or functions inside module will have scope pointing to the parent module.
+
+```
+module helper
+   real glr
+   ...
+end module helper
+
+!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...)
+!2 = !DIModule(scope: !1, name: "helper" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "glr" ...)
+
+Use of a module results in the following metadata.
+!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
+```
+
+Modules are not first class entities in the FIR. So there is no way to get
+the location where they are declared in source file.
+
+But the information that a variable or function is part of a module
+can be extracted from its mangled name along with name of the module. There is
+a `GlobalOp` generated for each module variable in FIR and there is also a
+`DeclareOp` in each function where the module variable is used.
+
+We will use the `GlobalOp` to generate the `DIModuleAttr` and associated
+`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used
+to generate `DIImportedEntityAttr`. Care will be taken to avoid generating
+duplicate `DIImportedEntityAttr` entries in same function.
+
+### Derived Types
+
+A derived type will be represented in metadata by `DICompositeType` with a tag of
+`DW_TAG_structure_type`. It will have elements which point to the components.
+
+```
+  type :: t_pair
+    integer :: i
+    real :: x
+  end type
+!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...)
+!2 = !{!3, !4}
+!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...)
+!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
+```
+
+In FIR, RecordType and TypeInfoOp can be used to get information about the
+types of the component and location of the derived type. However, there are
+still some open questions about derived types.
+
+1. What is the correct way to get the offset and alignment information for the
+members of the derived type.
+
+2. Use of derived type causes the generation of many other global variables.
+Do they need to be retained in debug info?
+
+3. The location where a component of the derived type is declared is not
+available in RecordType or TypeInfoOp. This probably will need to be
+added.
+
+### CommonBlocks
+
+A common block will be represented in metadata by `DICommonBlockAttr` which
+will be used as scope by the variable inside common block. `DIExpression`
+can be used to give the offset of any given variable inside the global storage
+for common block.
+
+```
+integer a, b
+common /test/ a, b
+
+;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6
+!1 = !DISubprogram()
+!2 = !DICommonBlock(scope: !1, name: "test" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "a" ...)
+!4 = !DIExpression()
+!5 = !DIGlobalVariableExpression(var: !3, expr: !4)
+!6 = !DIGlobalVariable(scope: !2, name: "b" ...)
+!7 = !DIExpression(DW_OP_plus_uconst, 4)
+!8 = !DIGlobalVariableExpression(var: !6, expr: !7)
+```
+
+In FIR, a common block results in a `GlobalOp` with common linkage. Every
+function where the common block is used has `DeclareOp` for that variable.
+This `DeclareOp` will point to global storage through
+`CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the
+location of this variable in global storage. There is enough information to
+generate the required metadata. Although it requires walking up the chain from
+`DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`.
+
+### Arrays
+
+The type of fixed size array is represented using `DICompositeType`. The
+`DISubrangeAttr` is used to provide bounds in any given dimensions.
+
+```
+integer abc(4,5)
+
+!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...)
+!2 = !{ !3, !4 }
+!3 = !DISubrange(lowerBound: 1, upperBound: 4 ...)
+!4 = !DISubrange(lowerBound: 1, upperBound: 5 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+
+```
+
+#### Adjustable
+
+The debug metadata for the adjustable array looks similar to fixed sized array
+with one change. The bounds are not constant values but point to a
+`DILocalVariableAttr`.
+
+In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain
+to get the value that represents the array bound in any dimension. We will
+create a `DILocalVariableAttr` that will point to that location. This
+variable will be used in the `DISubrangeAttr`. Note that this
----------------
jeanPerier wrote:

Does this need to be a variable as in some memory location, or can a `DILocalVariableAttr` point to any operation producing an SSA value (like an arith.addi)?

Lowering does not put `n+1` in memory when evaluating specification expressions of `x(n+1)`.

https://github.com/llvm/llvm-project/pull/86939


More information about the flang-commits mailing list