[cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

Wed Mar 2 17:19:25 PST 2016

Circling back around 4 months later...

I now believe that we should just let the frontend generate CV type info.
It's really not worth the hassle to try to have a common representation.
Enough C++ ABI-specific information leaks into the format that it's really
better to avoid trying to create a union of DWARF and CV type info in LLVM
DI metadata. We were able to reuse all the other non-type DI metadata, such
as location info and scope info, to emit inline line tables and variable
locations, so I think we did OK on reusing the existing infrastructure.
Compromising at not reusing the type representation seems OK.

I haven't come up with any ideas better than the design that Dave
Bartolomeo outlined below, so I think we should go ahead with that. One
thing I considered was extending DITypeRef to be a union between MDString*,
DIType*, and a type index, but I think that's too invasive. I also don't
want to make a whole DIType heap allocation just to wrap a 32-bit type
index, so I'm in favor of putting the indices into DISubprogram and
DIVariable.

Any thoughts on this plan?

On Thu, Oct 29, 2015 at 10:11 AM, Dave Bartolomeo via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>
> *Proposed Design*
>
> *How Debug Info is Generated*
>
> The CodeView type records for a compilation unit will be generated by the
> front-end for the source language (Clang, in the case of C and C++). The
> front-end has access to the full type system and AST of the language, which
> is necessary to generate accurate debug type info. The type records will be
> represented as metadata in the LLVM IR, similar to how DWARF debug info is
> represented. I’ll cover the actual representation in a bit more detail
> below.
>
> The LLVM back-end will be responsible for emitting the CodeView type
> records from the IR into the output .obj file. Since the type records will
> already be in the correct format, this is essentially just a copy. No
> inspection of the type records is necessary within LLVM. The back-end will
> also be responsible for generating CodeView symbol records, line numbers,
> and source file info for any functions and data defined in the compilation
> unit. The back-end is the logical place to do this because only the
> back-end knows the code addresses, data addresses, and stack frame layouts.
>
>
>
> *Representation of CodeView in LLVM IR*
>
> DICompileUnit
>
> + e*xisting fields*
>
> + CodeViewTypes : DICodeViewTypes
>
>
>
> DICodeViewTypes
>
> + TypeRecords : MDString[]
>
> + UDTSymbols : DICodeViewUDT[]
>
>
>
> DICodeViewUDT
>
> + Name : MDString
>
> + TypeIndex : uint32_t
>
>
>
> DIVariable
>
> + *existing fields*
>
> + TypeIndex : uint32_t
>
>
>
> DISubprogram
>
> + *existing fields*
>
> + TypeIndex : uint32_t
>
> The existing DICompileUnit node will have a new operand named
> CodeViewTypes, which points to the new DICodeViewTypes node that describes
> the CodeView type information for the compilation unit.
>
>
>
> The DICodeViewTypes node contains two operands:
>
> -          TypeRecords, an array of MDStrings containing the actual
> CodeView type records for the compilation unit, sorted in ascending order
> of type index.
>
> -          UDTSymbols, and array of DICodeViewUDT nodes describing the
> user-defined types (class/struct/union/enum) for which CodeView symbol
> records will need to be emitted by the back-end.
>
>
>
> The DICodeViewUDT node contains two operands:
>
> -          Name, an MDString with the name of the symbol as it should
> appear in the CodeView symbol record.
>
> -          TypeIndex, a uint32_t holding the CodeView type index of the
> type record for the user-defined type’s definition.
>
>
>
> The DICodeViewUDT nodes are necessary because they are generally the only
> references to the definition of the user-defined type. Other uses of that
> type refer to the forward declaration record for the type, and without a
> reference to the definition of the type, the linker will discard the
> definition record when it merges the type information into the PDB.
>
>
>
> To specify the CodeView type for a variable or function, the DIVariable
> and DISubprogram nodes will have an additional TypeIndex operand containing
> the type index of the type record for that variable or function’s type.
> This operand will be set to zero when CodeView debug info is not enabled.
>
>
>
> The above representation essentially extends the existing DWARF-focused
> debug metadata to also include CodeView info. This was the least invasive
> way I found to add CodeView support, but it may not be the right
> architectural decision. It would also be possible to have the CodeView
> metadata entirely separate from the DWARF metadata. This would reduce the
> size of the IR when only one form of debug information was being emitted,
> which is presumably the common case. However, I expect it would complicate
> the scenario where both DWARF and CodeView are being emitted; for example,
> would having two dbg.declare intrinsics for a single local variable confuse
> existing consumers of LLVM IR? I’m hoping someone more familiar with the
> existing debug info architecture can provide some guidance here if there’s
> a better way of doing this.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160302/39e82c4b/attachment.html>