[flang-commits] [flang] [flang] Add design document for debug info generation. (PR #86939)

Thu Apr 11 07:22:28 PDT 2024

================
@@ -0,0 +1,451 @@
+# Debug Generation
+
+Application developers spend a significant time debugging the applications that
+they create. Hence it is important that a compiler provide support for a good
+debug experience. DWARF[1] is the standard debugging file format used by
+compilers and debuggers. The LLVM infrastructure supports debug info generation
+using metadata[2]. Support for generating debug metadata is present
+in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to
+generate good debug information.
+
+We can break the work for debug generation into two separate tasks:
+1) Line Table generation
+2) Full debug generation
+The support for Fortran Debug in LLVM infrastructure[3] has made great progress
+due to many Fortran frontends adopting LLVM as the backend as well as the
+availability of the Classic Flang compiler.
+
+## Driver Flags
+By default, Flang will not generate any debug or linetable information.
+Debug information will be generated if the following flags are present.
+
+-gline-tables-only, -g1 : Emit debug line number tables only  
+-g : Emit full debug info
+
+## Line Table Generation
+
+There is existing AddDebugFoundationPass which add `FusedLoc` with a
+`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata
+for that function. However, following values are hardcoded at the moment. These
+will instead be passed from the driver.
+
+- Details of the compiler (name and version and git hash).
+- Language Standard. We can set it to Fortran95 for now and periodically
+revise it when full support for later standards is available.
+- Optimisation Level.
+- Type of debug generated (linetable/full debug).
+- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is
+the main program.
+
+`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
+match the signature of the actual function/subroutine.
+
+
+## Full Debug Generation
+
+Full debug info will include metadata to describe functions, variables and
+types. Flang will generate debug metadata in the form of MLIR attributes. These
+attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
+
+Debug metadata generation can be broken down in 2 steps.
+
+1. MLIR attributes are generated by reading information from AST or FIR. This
+step can happen anytime before or during conversion to LLVM dialect. An example
+of the metadata generated in this step is `DILocalVariableAttr` or
+`DIDerivedTypeAttr`.
+
+2. Changes that can only happen during or after conversion to LLVM dialect. The
+example of this is passing `DIGlobalVariableExpressionAttr` while
+creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp`
+that is required for local variables. It can only be created after conversion to
+LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are
+quite minimal. The bulk of the work happens in step 1.
+
+One design decision that we need to make is to decide where to perform step 1.
+Here are some possible options:
+
+**During conversion to LLVM dialect**
+
+Pros:
+1. Do step 1 and 2 in one place.
+2. No chance of missing any change introduced by an earlier transformation.
+
+Cons:
+1. Passing a lot of information from the driver as discussed in the line table
+section above may muddle interface of FIRToLLVMConversion.
+2. `DeclareOp` is removed before this pass.
+3. Even if `DeclareOp` is retained, creating debug metadata while some ops have
+been converted to LLVMdialect and others are not may cause its own issues. We
+have to walk the ops chain to extract the information which may be problematic
+in this case.
+4. Some source information is lost by this point. Examples include
+information about namelists, source line information about field of derived
+types etc.
+
+**During a pass before conversion to LLVM dialect**
+
+This is similar to what AddDebugFoundationPass is currently doing.
+
+Pros:
+1. One central location dedicated to debug information processing. This can
+result in a cleaner implementation.
+2. Similar to above, less chance of missing any change introduced by an earlier
+transformation.
+
+Cons:
+1. Step 2 still need to happen during conversion to LLVM dialect. But
+changes required for step 2 are quite minimal.
+2. Similar to above, some source information may be lost by this point.
+
+**During Lowering from AST**
+
+Pros
+1. We have better source information.
+
+Cons:
+1. There may be change in the code after lowering which may not be
+reflected in debug information.
+2. Comments on an earlier PR [5] advised against this approach.
+
+## Design
+
+The design below assumes that we are extracting the information from FIR.
+If we generate debug metadata during lowering then the description below
+may need to change. Although the generated metadata remains the same in
+both cases.
+
+The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
+information mentioned in the line info section above will be passed to it from
+the driver. This pass will run quite late in the pipeline but before
+`DecalreOp` is removed.
+
+In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp`
+and `DeclareOp` to extract the source information and build the MLIR
+attributes. A class will be added to handle conversion of MLIR and FIR types to
+`DITypeAttr`.
+
+Following sections provide details of how various language constructs will be
+handled. In these sections, the LLVM IR metadata and MLIR attributes have been
+used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute
+which gets translated to LLVM IR's `DILocalVariable`.
+
+### Variables
+
+#### Local Variables
+  In MLIR, local variables are represented by `DILocalVariableAttr` which
+  stores information like source location and type. They also require a
+  `DbgDeclareOp` which binds `DILocalVariableAttr` with a location.
+
+  In FIR, `DeclareOp` has source information about the variable. The
+  `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is
+  attached to the memref op of the `DeclareOp` using a `FusedLoc` approach.
+
+  During conversion to LLVM dialect, when an op is encountered that has a
+  `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which
+  binds the attr with its location.
+
+  The change in the IR look like as follows:
+
+```
+  original fir
+  %2 = fir.alloca i32  loc(#loc4)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+
+  Fir with FusedLoc.
+
+  %2 = fir.alloca i32  loc(#loc38)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+  #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... >
+  #loc38 = loc(fused<#di_local_variable5>[#loc4])
+
+  After conversion to llvm dialect
+
+  #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...>
+  %1 = llvm.alloca %0 x i64
+  llvm.intr.dbg.declare #di_local_variable = %1
+```
+
+#### Function Arguments
+
+Arguments work in similar way, but they present a difficulty that `DeclareOp`'s
+memref points to `BlockArgument`. Unlike the op in local variable case,
+the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily
+be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering 
+or in a separate pass.
+
+### Module
+
+In debug metadata, the Fortran module will be represented by `DIModuleAttr`.
+The variables or functions inside module will have scope pointing to the parent module.
+
+```
+module helper
+   real glr
+   ...
+end module helper
+
+!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...)
+!2 = !DIModule(scope: !1, name: "helper" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "glr" ...)
+
+Use of a module results in the following metadata.
+!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
+```
+
+Modules are not first class entities in the FIR. So there is no way to get
+the location where they are declared in source file.
+
+But the information that a variable or function is part of a module
+can be extracted from its mangled name along with name of the module. There is
+a `GlobalOp` generated for each module variable in FIR and there is also a
+`DeclareOp` in each function where the module variable is used.
+
+We will use the `GlobalOp` to generate the `DIModuleAttr` and associated
+`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used
+to generate `DIImportedEntityAttr`. Care will be taken to avoid generating
+duplicate `DIImportedEntityAttr` entries in same function.
+
+### Derived Types
+
+A derived type will be represented in metadata by `DICompositeType` with a tag of
+`DW_TAG_structure_type`. It will have elements which point to the components.
+
+```
+  type :: t_pair
+    integer :: i
+    real :: x
+  end type
+!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...)
+!2 = !{!3, !4}
+!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...)
+!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
+```
+
+In FIR, RecordType and TypeInfoOp can be used to get information about the
+types of the component and location of the derived type. However, there are
+still some open questions about derived types.
+
+1. What is the correct way to get the offset and alignment information for the
+members of the derived type.
+
+2. Use of derived type causes the generation of many other global variables.
+Do they need to be retained in debug info?
+
+3. The location where a component of the derived type is declared is not
+available in RecordType or TypeInfoOp. This probably will need to be
+added.
+
+### CommonBlocks
+
+A common block will be represented in metadata by `DICommonBlockAttr` which
+will be used as scope by the variable inside common block. `DIExpression`
+can be used to give the offset of any given variable inside the global storage
+for common block.
+
+```
+integer a, b
+common /test/ a, b
+
+;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6
+!1 = !DISubprogram()
+!2 = !DICommonBlock(scope: !1, name: "test" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "a" ...)
+!4 = !DIExpression()
+!5 = !DIGlobalVariableExpression(var: !3, expr: !4)
+!6 = !DIGlobalVariable(scope: !2, name: "b" ...)
+!7 = !DIExpression(DW_OP_plus_uconst, 4)
+!8 = !DIGlobalVariableExpression(var: !6, expr: !7)
+```
+
+In FIR, a common block results in a `GlobalOp` with common linkage. Every
+function where the common block is used has `DeclareOp` for that variable.
+This `DeclareOp` will point to global storage through
+`CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the
+location of this variable in global storage. There is enough information to
+generate the required metadata. Although it requires walking up the chain from
+`DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`.
+
+### Arrays
+
+The type of fixed size array is represented using `DICompositeType`. The
+`DISubrangeAttr` is used to provide bounds in any given dimensions.
+
+```
+integer abc(4,5)
+
+!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...)
+!2 = !{ !3, !4 }
+!3 = !DISubrange(lowerBound: 1, upperBound: 4 ...)
+!4 = !DISubrange(lowerBound: 1, upperBound: 5 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+
+```
+
+#### Adjustable
+
+The debug metadata for the adjustable array looks similar to fixed sized array
+with one change. The bounds are not constant values but point to a
+`DILocalVariableAttr`.
+
+In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain
+to get the value that represents the array bound in any dimension. We will
+create a `DILocalVariableAttr` that will point to that location. This
+variable will be used in the `DISubrangeAttr`. Note that this
+`DILocalVariableAttr` does not correspond to any source variable.
+
+#### Assumed Size
+
+This is treated as raw array. Debug information will not provide any upper bound
+information for the last dimension.
+
+#### Assumed Shape
+The assumed shape array will use the similar representation as fixed size
+array but there will be 2 differences.
+
+1. There will be a `datalocation` field which will be an expression. This will
+enable debugger to get the data pointer from array descriptor.
+
+2. The field in `DISubrangeAttr` for array bounds will be expression which will
+allow the debugger to get the bounds from descriptor.
----------------
abidh wrote:

These will be generated during normal debug attribute generation in AddDebugInfoPass. I am not clear about the 2nd part of the question. If you mean how we will get the offsets of this information in descriptor, I think we know the layout of the descriptor in the memory and can tell where various things are located. If you mean how debugger will get this information, we will encode them in `DWARF`'s expressions which can be decoded by the debuggers.

Please note that most of the examples that I included in the document are known to work. Here are the steps I followed to verify the example debug metadata.

1. Use flang-new to create LLVM IR file from the sample fortran file.
2. Manually edit the .ll file to add the example debug metadata
3. Build the edited file into an executable
3. Run the executable under `GDB` and check that debugger is able to extract the right information.


https://github.com/llvm/llvm-project/pull/86939