[flang-commits] [flang] [flang] Add design document for debug info generation. (PR #86939)

via flang-commits flang-commits at lists.llvm.org
Thu Apr 11 06:37:12 PDT 2024


================
@@ -0,0 +1,451 @@
+# Debug Generation
+
+Application developers spend a significant time debugging the applications that
+they create. Hence it is important that a compiler provide support for a good
+debug experience. DWARF[1] is the standard debugging file format used by
+compilers and debuggers. The LLVM infrastructure supports debug info generation
+using metadata[2]. Support for generating debug metadata is present
+in MLIR by way of MLIR attributes. Flang can leverage these MLIR attributes to
+generate good debug information.
+
+We can break the work for debug generation into two separate tasks:
+1) Line Table generation
+2) Full debug generation
+The support for Fortran Debug in LLVM infrastructure[3] has made great progress
+due to many Fortran frontends adopting LLVM as the backend as well as the
+availability of the Classic Flang compiler.
+
+## Driver Flags
+By default, Flang will not generate any debug or linetable information.
+Debug information will be generated if the following flags are present.
+
+-gline-tables-only, -g1 : Emit debug line number tables only  
+-g : Emit full debug info
+
+## Line Table Generation
+
+There is existing AddDebugFoundationPass which add `FusedLoc` with a
+`SubprogramAttr` on FuncOp. This allows MLIR to generate LLVM IR metadata
+for that function. However, following values are hardcoded at the moment. These
+will instead be passed from the driver.
+
+- Details of the compiler (name and version and git hash).
+- Language Standard. We can set it to Fortran95 for now and periodically
+revise it when full support for later standards is available.
+- Optimisation Level.
+- Type of debug generated (linetable/full debug).
+- Calling Convention: `DW_CC_normal` by default and `DW_CC_program` if it is
+the main program.
+
+`DISubroutineTypeAttr` currently has a fixed type. This will be changed to
+match the signature of the actual function/subroutine.
+
+
+## Full Debug Generation
+
+Full debug info will include metadata to describe functions, variables and
+types. Flang will generate debug metadata in the form of MLIR attributes. These
+attributes will be converted to the format expected by LLVM IR in DebugTranslation[4].
+
+Debug metadata generation can be broken down in 2 steps.
+
+1. MLIR attributes are generated by reading information from AST or FIR. This
+step can happen anytime before or during conversion to LLVM dialect. An example
+of the metadata generated in this step is `DILocalVariableAttr` or
+`DIDerivedTypeAttr`.
+
+2. Changes that can only happen during or after conversion to LLVM dialect. The
+example of this is passing `DIGlobalVariableExpressionAttr` while
+creating `LLVM::GlobalOp`. Another example will be generation of `DbgDeclareOp`
+that is required for local variables. It can only be created after conversion to
+LLVM dialect as it requires LLVM.Ptr type. The changes required for step 2 are
+quite minimal. The bulk of the work happens in step 1.
+
+One design decision that we need to make is to decide where to perform step 1.
+Here are some possible options:
+
+**During conversion to LLVM dialect**
+
+Pros:
+1. Do step 1 and 2 in one place.
+2. No chance of missing any change introduced by an earlier transformation.
+
+Cons:
+1. Passing a lot of information from the driver as discussed in the line table
+section above may muddle interface of FIRToLLVMConversion.
+2. `DeclareOp` is removed before this pass.
+3. Even if `DeclareOp` is retained, creating debug metadata while some ops have
+been converted to LLVMdialect and others are not may cause its own issues. We
+have to walk the ops chain to extract the information which may be problematic
+in this case.
+4. Some source information is lost by this point. Examples include
+information about namelists, source line information about field of derived
+types etc.
+
+**During a pass before conversion to LLVM dialect**
+
+This is similar to what AddDebugFoundationPass is currently doing.
+
+Pros:
+1. One central location dedicated to debug information processing. This can
+result in a cleaner implementation.
+2. Similar to above, less chance of missing any change introduced by an earlier
+transformation.
+
+Cons:
+1. Step 2 still need to happen during conversion to LLVM dialect. But
+changes required for step 2 are quite minimal.
+2. Similar to above, some source information may be lost by this point.
+
+**During Lowering from AST**
+
+Pros
+1. We have better source information.
+
+Cons:
+1. There may be change in the code after lowering which may not be
+reflected in debug information.
+2. Comments on an earlier PR [5] advised against this approach.
+
+## Design
+
+The design below assumes that we are extracting the information from FIR.
+If we generate debug metadata during lowering then the description below
+may need to change. Although the generated metadata remains the same in
+both cases.
+
+The AddDebugFoundationPass will be renamed to AddDebugInfo Pass. The
+information mentioned in the line info section above will be passed to it from
+the driver. This pass will run quite late in the pipeline but before
+`DecalreOp` is removed.
+
+In this pass, we will iterate through the `GlobalOp`, `TypeInfoOp`, `FuncOp`
+and `DeclareOp` to extract the source information and build the MLIR
+attributes. A class will be added to handle conversion of MLIR and FIR types to
+`DITypeAttr`.
+
+Following sections provide details of how various language constructs will be
+handled. In these sections, the LLVM IR metadata and MLIR attributes have been
+used interchangeably. As an example, `DILocalVariableAttr` is an MLIR attribute
+which gets translated to LLVM IR's `DILocalVariable`.
+
+### Variables
+
+#### Local Variables
+  In MLIR, local variables are represented by `DILocalVariableAttr` which
+  stores information like source location and type. They also require a
+  `DbgDeclareOp` which binds `DILocalVariableAttr` with a location.
+
+  In FIR, `DeclareOp` has source information about the variable. The
+  `DeclareOp` will be processed to create `DILocalVariableAttr`. This attr is
+  attached to the memref op of the `DeclareOp` using a `FusedLoc` approach.
+
+  During conversion to LLVM dialect, when an op is encountered that has a
+  `DILocalVariableAttr` in its `FusedLoc`, a `DbgDeclareOp` is created which
+  binds the attr with its location.
+
+  The change in the IR look like as follows:
+
+```
+  original fir
+  %2 = fir.alloca i32  loc(#loc4)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+
+  Fir with FusedLoc.
+
+  %2 = fir.alloca i32  loc(#loc38)
+  %3 = fir.declare %2 {uniq_name = "_QMhelperFchangeEi"}
+  #di_local_variable5 = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ... >
+  #loc38 = loc(fused<#di_local_variable5>[#loc4])
+
+  After conversion to llvm dialect
+
+  #di_local_variable = #llvm.di_local_variable<name = "i", line = 5, type = #di_basic_type ...>
+  %1 = llvm.alloca %0 x i64
+  llvm.intr.dbg.declare #di_local_variable = %1
+```
+
+#### Function Arguments
+
+Arguments work in similar way, but they present a difficulty that `DeclareOp`'s
+memref points to `BlockArgument`. Unlike the op in local variable case,
+the `BlockArgument` are not handled by the FIRToLLVMLowering. This can easily
+be handled by adding after conversion to LLVM dialect either in FIRToLLVMLowering 
+or in a separate pass.
+
+### Module
+
+In debug metadata, the Fortran module will be represented by `DIModuleAttr`.
+The variables or functions inside module will have scope pointing to the parent module.
+
+```
+module helper
+   real glr
+   ...
+end module helper
+
+!1 = !DICompileUnit(language: DW_LANG_Fortran90 ...)
+!2 = !DIModule(scope: !1, name: "helper" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "glr" ...)
+
+Use of a module results in the following metadata.
+!4 = !DIImportedEntity(tag: DW_TAG_imported_module, entity: !2)
+```
+
+Modules are not first class entities in the FIR. So there is no way to get
+the location where they are declared in source file.
+
+But the information that a variable or function is part of a module
+can be extracted from its mangled name along with name of the module. There is
+a `GlobalOp` generated for each module variable in FIR and there is also a
+`DeclareOp` in each function where the module variable is used.
+
+We will use the `GlobalOp` to generate the `DIModuleAttr` and associated
+`DIGlobalVariableAttr`. A `DeclareOp` for module variable will be used
+to generate `DIImportedEntityAttr`. Care will be taken to avoid generating
+duplicate `DIImportedEntityAttr` entries in same function.
+
+### Derived Types
+
+A derived type will be represented in metadata by `DICompositeType` with a tag of
+`DW_TAG_structure_type`. It will have elements which point to the components.
+
+```
+  type :: t_pair
+    integer :: i
+    real :: x
+  end type
+!1 = !DICompositeType(tag: DW_TAG_structure_type, name: "t_pair", elements: !2 ...)
+!2 = !{!3, !4}
+!3 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "i", size: 32, offset: 0, baseType: !5 ...)
+!4 = !DIDerivedType(tag: DW_TAG_member, scope: !1, name: "x", size: 32, offset: 32, baseType: !6 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+!6 = !DIBasicType(tag: DW_TAG_base_type, name: "real" ...)
+```
+
+In FIR, RecordType and TypeInfoOp can be used to get information about the
+types of the component and location of the derived type. However, there are
+still some open questions about derived types.
+
+1. What is the correct way to get the offset and alignment information for the
+members of the derived type.
+
+2. Use of derived type causes the generation of many other global variables.
+Do they need to be retained in debug info?
+
+3. The location where a component of the derived type is declared is not
+available in RecordType or TypeInfoOp. This probably will need to be
+added.
+
+### CommonBlocks
+
+A common block will be represented in metadata by `DICommonBlockAttr` which
+will be used as scope by the variable inside common block. `DIExpression`
+can be used to give the offset of any given variable inside the global storage
+for common block.
+
+```
+integer a, b
+common /test/ a, b
+
+;@test_ = common global [8 x i8] zeroinitializer, !dbg !5, !dbg !6
+!1 = !DISubprogram()
+!2 = !DICommonBlock(scope: !1, name: "test" ...)
+!3 = !DIGlobalVariable(scope: !2, name: "a" ...)
+!4 = !DIExpression()
+!5 = !DIGlobalVariableExpression(var: !3, expr: !4)
+!6 = !DIGlobalVariable(scope: !2, name: "b" ...)
+!7 = !DIExpression(DW_OP_plus_uconst, 4)
+!8 = !DIGlobalVariableExpression(var: !6, expr: !7)
+```
+
+In FIR, a common block results in a `GlobalOp` with common linkage. Every
+function where the common block is used has `DeclareOp` for that variable.
+This `DeclareOp` will point to global storage through
+`CoordinateOp` and `AddrOfOp`. The `CoordinateOp` has the offset of the
+location of this variable in global storage. There is enough information to
+generate the required metadata. Although it requires walking up the chain from
+`DeclaredOp` to locate `CoordinateOp` and `AddrOfOp`.
+
+### Arrays
+
+The type of fixed size array is represented using `DICompositeType`. The
+`DISubrangeAttr` is used to provide bounds in any given dimensions.
+
+```
+integer abc(4,5)
+
+!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !5, elements: !2 ...)
+!2 = !{ !3, !4 }
+!3 = !DISubrange(lowerBound: 1, upperBound: 4 ...)
+!4 = !DISubrange(lowerBound: 1, upperBound: 5 ...)
+!5 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+
+```
+
+#### Adjustable
+
+The debug metadata for the adjustable array looks similar to fixed sized array
+with one change. The bounds are not constant values but point to a
+`DILocalVariableAttr`.
+
+In FIR, the `DeclareOp` points to a `ShapeOp` and we can walk the chain
+to get the value that represents the array bound in any dimension. We will
+create a `DILocalVariableAttr` that will point to that location. This
+variable will be used in the `DISubrangeAttr`. Note that this
+`DILocalVariableAttr` does not correspond to any source variable.
+
+#### Assumed Size
+
+This is treated as raw array. Debug information will not provide any upper bound
+information for the last dimension.
+
+#### Assumed Shape
+The assumed shape array will use the similar representation as fixed size
+array but there will be 2 differences.
+
+1. There will be a `datalocation` field which will be an expression. This will
+enable debugger to get the data pointer from array descriptor.
+
+2. The field in `DISubrangeAttr` for array bounds will be expression which will
+allow the debugger to get the bounds from descriptor.
+
+```
+integer(4), intent(out) :: a(:,:)
+
+!1 = !DICompositeType(tag: DW_TAG_array_type, baseType: !8, elements: !2, dataLocation: !3)
+!2 = !{!5, !7}
+!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
+!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
+!5 = !DISubrange(lowerBound: !1, upperBound: !4 ...)
+!6 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 56, DW_OP_deref)
+!7 = !DISubrange(lowerBound: !1, upperBound: !6, ...)
+!8 = !DIBasicType(tag: DW_TAG_base_type, name: "integer" ...)
+```
+
+In assumed shape case, the rank can be determined from the FIR's `SequenceType`.
+This allows us to generate a `DISubrangeAttr` in each dimension.
+
+#### Assumed Rank
+
+This is currently unsupported in flang. Its representation will be similar to
+array representation for assumed shape array with the following difference.
+
+1. `DICompositeTypeAttr` will have a rank field which will be an expression.
+It will be used to get the rank value from descriptor.
+2. Instead of `DISubrangeType` for each dimension, there will be a single
+`DIGenericSubrange` which will allow debuggers to calculate bounds in any
+dimension.
+
+### Pointers and Allocatables
+The pointer and allocatable will be represented using `DICompositeTypeAttr`. It
+is quirk of DWARF that scalar allocatable or pointer variables will show up in
+the debug info as pointer to scalar while array pointer or allocatable
+variables show up as arrays. The behavior is same in gfortran and classic flang.
+
+```
+  integer, allocatable :: ar(:)
+  integer, pointer :: sc
+
+!1 = !DILocalVariable(name: "sc", type: !2)
+!2 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !3, associated: !9 ...)
+!3 = !DIBasicType(tag: DW_TAG_base_type, name: "integer", ...)
+!4 = !DILocalVariable(name: "ar", type: !5 ...)
+!5 = !DICompositeType(tag: DW_TAG_array_type, baseType: !3, elements: !6, dataLocation: !8, allocated: !9)
+!6 = !{!7}
+!7 = !DISubrange(lowerBound: !10, upperBound: !11 ...)
+!8 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
+!9 = !DIExpression(DW_OP_push_object_address, DW_OP_deref, DW_OP_lit0, DW_OP_ne)
+!10 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 24, DW_OP_deref)
+!11 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 32, DW_OP_deref)
+
+```
+
+IN FIR, these variable are represent as <!fir.box<!fir.heap<>> or
+fir.box<!fir.ptr<>>. There is also `allocatable` or `pointer` attribute on
+the `DeclareOp`. This allows us to generate allocated/associated status of
+these variables. The metadata to get the information from the descriptor is
+similar to arrays.
+
+### Strings
+
+The `DIStringTypeAttr` can represent both fixed size and allocatable strings. For
+the allocatable case, the `stringLengthExpression` and `stringLocationExpression`
+are used to provide the length and the location of the string respectively.
+
+```
+  character(len=:), allocatable :: var
+  character(len=20) :: fixed
+
+!1 = !DILocalVariable(name: "var", type: !2)
+!2 = !DIStringType(name: "character(*)", stringLengthExpression: !4, stringLocationExpression: !3 ...)
+!3 = !DIExpression(DW_OP_push_object_address, DW_OP_deref)
+!4 = !DIExpression(DW_OP_push_object_address, DW_OP_plus_uconst, 8)
+
+!5 = !DILocalVariable(name: "fixed", type: !6)
+!6 = !DIStringType(name: "character (20)", size: 160)
+
+```
+
+### Association
+
+They will be treated like normal variables. Although we may require to handle
+the case where the `DeclareOp` of one variable points to the `DeclareOp` of
+another variable (e.g. a => b).
+
+### Namelists
+
+FIR does not seem to have a way to extract information about namelists.
+
+```
+namelist /abc/ x3, y3
+
+(gdb) p abc
+$1 = ( x3 = 100, y3 = 500 )
+(gdb) p x3
+$2 = 100
+(gdb) p y3
+$3 = 500
+```
+
+Even without namelist support, we should be able to see the value of the
+individual variables like `x3` and `y3` in the above example. But we would not
+be able to evaluate the namelist and have the debugger prints the value of all
+the variables in it as shown above for `abc`.
+
+## Missing metadata in MLIR
+
+Some metadata types that are needed for fortran are present in LLVM IR but are
+absent from MLIR. A non comprehensive list is given below.
+
+1. `DICommonBlockAttr`
+2. `DIGenericSubrangeAttr`
+3. `DISubrangeAttr` in MLIR takes IntegerAttr at the moment so only works
+with fixed sizes arrays. It needs to also accept `DIExpressionAttr` or
+`DILocalVariableAttr` to support assumed shape and adjustable arrays.
+4. The `DICompositeTypeAttr` will need to have field for `datalocation`,
+`rank`, `allocated` and `associated`.
+5. `DIStringTypeAttr`
+
+# Testing
+
+- LLVM LIT tests will be added to test:
+  - the driver and ensure that it passes the line table and full debug
+    info generation appropriately.
+  - that the pass works as expected and generates debug info. Test will be
+    with `fir-opt`.
+  - with `flang -fc1` that end-to-end debug info generation works.
+- Manual external tests will be written to ensure that the following works
+  in debug tools
+  - Break at lines.
+  - Break at functions.
+  - print type (ptype) of function names.
+  - print values and types (ptype) of various type of variables
+- Manually run `GDB`'s gdb.fortran testsuite with llvm-flang.
----------------
abidh wrote:

The `GDB` tests require its tcl based framework to issue commands and look for results. It may not be feasible to add them to llvm-testsuite. Adding a bot that builds flang and then run `gdb.fortran` tests with it may be a possibility.

https://github.com/llvm/llvm-project/pull/86939


More information about the flang-commits mailing list