[flang-commits] [flang] [flang] Add design document for debug info generation. (PR #86939)

Tue Apr 2 06:51:22 PDT 2024

================
@@ -0,0 +1,452 @@
+# Debug Generation
+
+Application developers spend a significant time debugging the applications that
+they create. Hence it is important that a compiler provide support for a good
+debug experience. DWARF[1] is the standard debugging file format used by
+compilers and debuggers. The LLVM infrastructure supports debug info generation
+using metadata[2]. Support for generating debug metadata is present
+in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to
+generate good debug information.
+
+We can break the work for debug generation into two separate tasks:
+1) Line Table generation
+2) Full debug generation
+The support for Fortran Debug in LLVM infrastructure[3] has made great progress
+due to many Fortran frontends adopting LLVM as the backend as well as the
+availability of the Classic Flang compiler.
+
+## Driver Flags
+By default, Flang will not generate any debug or linetable information.
+Debug information will be generated if the following flags are present.
+
+-gline-tables-only, -g1 : Emit debug line number tables only  
+-g : Emit full debug info
+
+## Line Table Generation
+
+There is existing AddDebugFoundationPass which add `FusedLoc` with a
+`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata
+for that function. However, following values are hardcoded at the moment. These
+will instead be passed from the driver.
+
+- Details of the compiler (name and version and git hash).
+- Language Standard. We can set it to Fortran95 for now and periodically
+revise it when full support for later standards is available.
+- Optimisation Level.
+- Type of debug generated (linetable/full debug).
+- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is
+the main program.
+
+`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
+match the signature of the actual function/subroutine.
+
+
+## Full Debug Generation
+
+Full debug info will include metadata to describe functions, variables and
+types. Flang will generate debug metadata in the form of MLIR attributes. These
+attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
+
+Debug metadata generation can be broken down in 2 steps.
+
+1. MLIR attributes are generated by reading information from AST or FIR. This
+step can happen anytime before or during conversion to LLVM dialect. An example
+of the metadata generated in this step is `DILocalVariableAttr` or
+`DIDerivedTypeAttr`.
+
+2. Changes that can only happen during or after conversion to LLVM dialect. The
+example of this is passing `DIGlobalVariableExpressionAttr` while
+creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp`
+that is required for local variables. It can only be created after conversion to
+LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are
+quite minimal. The bulk of the work happens in step 1.
+
+One design decision that we need to make is to decide where to perform step 1.
+Here are some possible options:
+
+**During conversion to LLVM dialect**
+
+Pros:
+1. Do step 1 and 2 in one place.
+2. No chance of missing any change introduced by an earlier transformation.
+
+Cons:
+1. Passing a lot of information from the driver as discussed in the line table
+section above may muddle interface of FIRToLLVMConversion.
+2. `DeclareOp` is removed before this pass.
+3. Even if `DeclareOp` is retained, creating debug metadata while some ops have
+been converted to LLVMdialect and others are not may cause its own issues. We
+have to walk the ops chain to extract the information which may be problematic
+in this case.
+4. Some source information is lost by this point. Examples include
+information about namelists, source line information about field of derived
+types etc.
+
+**During a pass before conversion to LLVM dialect**
+
+This is similar to what AddDebugFoundationPass is currently doing.
+
+Pros:
+1. One central location dedicated to debug information processing. This can
+result in a cleaner implementation.
+2. Similar to above, less chance of missing any change introduced by an earlier
+transformation.
+
+Cons:
+1. Step 2 still need to happen during conversion to LLVM dialect. But
+changes required for step 2 are quite minimal.
+2. Similar to above, some source information may be lost by this point.
+
+**During Lowering from AST**
+
+Pros
+1. We have better source information.
+
+Cons:
+1. There may be change in the code after lowering which may not be
+reflected in debug information.
+2. Comments on an earlier PR [5] advised against this approach.
+
+## Design
+
+The design below assumes that we are extracting the information from FIR.
+If we generate debug metadata during lowering then the description below
+may need to change. Although the generated metadata remains the same in
+both cases.
+
+The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
+information mentioned in the line info section above will be passed to it from
+the driver. This pass will run quite late in the pipeline but before
+`DecalreOp` is removed.
+
+In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp`
+and `DeclareOp` to extract the source information and build the MLIR
+attributes. A class will be added to handle conversion of MLIR and FIR types to
+`DITypeAttr`.
+
+Following sections provide details of how various language constructs will be
+handled. In these sections, the LLVM IR metadata and MLIR attributes have been
+used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute
+which gets translated to LLVM IR's `DILocalVariable`.
+
+### Variables
+
+#### Local Variables
+  In MLIR, local variables are represented by `DILocalVariableAttr` which
+  stores information like source location and type. They also require a
+  `DbgDeclareOp` which binds `DILocalVariableAttr` with a location.
+
+  In FIR, `DeclareOp` has source information about the variable. The
+  `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is
+  attached to the memref op of the `DeclareOp` using a `FusedLoc` approach.
+
+  During conversion to LLVM dialect, when an op is encountered that has a
+  `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which
+  binds the attr with its location.
+
+  The change in the IR look like as follows:
+
+```
+  original fir
+  %2 = fir.alloca i32  loc(#loc4)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+
+  Fir with FusedLoc.
+
+  %2 = fir.alloca i32  loc(#loc38)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+  #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... >
+  #loc38 = loc(fused<#di_local_variable5>[#loc4])
+
+  After conversion to llvm dialect
+
+  #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...>
+  %1 = llvm.alloca %0 x i64
+  llvm.intr.dbg.declare #di_local_variable = %1
+```
+
+#### Function Arguments
+
+Arguments work in similar way, but they present a difficulty that `DeclareOp`'s
+memref points to `BlockArgument`. Unlike the op in local variable case,
+the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily
+be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering 
+or in a separate pass.
+
+### Module
+
+In debug metadata, the Fortran module will be represented by `DIModuleAttr`.
+The variables or functions inside module will have scope pointing to the parent module.
+
+```
+module helper
+   real glr
+   ...
+end module helper
+
+!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...)
+!2 = !DIModule(scope: !1, name: "helper" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "glr" ...)
+
+Use of a module results in the following metadata.
+!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
+```
+
+Modules are not first class entities in the FIR. So there is no way to get
+the location where they are declared in source file.
+
+But the information that a variable or function is part of a module
+can be extracted from its mangled name along with name of the module. There is
+a `GlobalOp` generated for each module variable in FIR and there is also a
+`DeclareOp` in each function where the module variable is used.
+
+We will use the `GlobalOp` to generate the `DIModuleAttr` and associated
+`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used
+to generate `DIImportedEntityAttr`. Care will be taken to avoid generating
+duplicate `DIImportedEntityAttr` entries in same function.
+
+### Derived Types
+
+A derived type will be represented in metadata by `DICompositeType` with a tag of
+`DW_TAG_structure_type`. It will have elements which point to the components.
+
+```
+  type :: t_pair
+    integer :: i
+    real :: x
+  end type
+!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...)
+!2 = !{!3, !4}
+!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...)
+!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
+```
+
+In FIR, RecordType and TypeInfoOp can be used to get information about the
+types of the component and location of the derived type. However, there are
+still some open questions about derived types.
+
+1. What is the correct way to get the offset and alignment information for the
+members of the derived type.
----------------
jeanPerier wrote:

We can generate that info from the FIR RecordType type directly + mlir datalayout (there is currently no utility generating this info since this is left-up to LLVM, but this is doable (similar logic as what happens in [fir::getTypeSizeAndAlignment](https://github.com/llvm/llvm-project/blob/56aeac47ab0858db9f447b5ec43b660d9035167f/flang/include/flang/Optimizer/Dialect/FIRType.h#L482).

https://github.com/llvm/llvm-project/pull/86939