[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

Wed Mar 16 14:13:47 PDT 2016

Hi All,

Reid, Dave and I have chatted about this quite a bit and I think we have a
way forward that gets us in a direction we'd like to go, offers some
potential performance benefits for existing dwarf users, and maintains some
compatibility while transitions are happening. We're currently writing up a
proposal and will send it out for RFC shortly.

Thanks!

-eric

On Thu, Mar 10, 2016 at 10:51 AM Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> It is certainly *possible* to use the existing DIType hierarchy to
> generate CodeView, but I don't believe it is useful. We would have to make
> the DI metadata into the union of DWARF and CodeView, and it would be
> horrible. Here is an incomplete list of things that would be awkward:
>
> - Member pointer inheritance models. Not all pointers to members are the
> same size.
> - Describing locations of virtual bases in vbtables. I'm not sure how to
> get from DW_TAG_inheritance data to "offset of vbptr from vfptr of complete
> class".
> - Describing 'this' adjustments performed in virtual method prologues.
> - New virtuality types to indicate "introducing" virtual methods.
> - New flags on everything, see CodeView.h for more info.
>
> If you need more visibility into what's different, consider this C++
> source:
>
> struct A {
>   virtual void f() {}
>   int a;
> };
> struct B : virtual A {
>   virtual void f() {}
>   virtual void g() {}
>   int b;
> };
> struct C : virtual A {
>   virtual void f() {}
>   virtual void h() {}
>   int c;
> };
> struct D : B, C {
>   virtual void f() {}
>   virtual void g() {}
>   virtual void h() {}
>   int d;
> };
> D d;
> auto mp = &D::f;
>
> Compare the metadata that clang generates with the dump of the codeview
> that MSVC generates, and decide for yourself if the representations are a
> good match:
> $ clang -cc1 -std=c++11 -emit-llvm -debug-info-kind=limited t.cpp -o -
> -triple x86_64-linux -o t.ll
> LLVM IR: https://ghostbin.com/paste/dpqo8
> $ cl -c t.cpp -Z7 && llvm-readobj -codeview t.obj
> Dump of MSVC CodeView: https://ghostbin.com/paste/92ya3
>
> Sure, yes, it is *possible* to write a converter from one to the other,
> but why is it necessary? What use case does it enable?
>
> You might think it would allow non-Clang frontends to avoid having
> separate type info emitters, but in practice it won't, because these
> frontends will need to be augmented to pass down all kinds of CV-specific
> junk.
>
> On Tue, Mar 8, 2016 at 4:39 AM, Aboud, Amjad <amjad.aboud at intel.com>
> wrote:
>
>> Hi,
>>
>> I said it before and I am saying it again, I do not think that this
>> proposal is needed to support Codeview.
>>
>>
>>
>> 1.       Why cannot Codegen make use of current DIType metadata to
>> represent the codeview types?
>>
>> 2.       Why cannot “DW_TAG_typedef” be used to generate the
>> “DICodeViewUDT” symbol?
>>
>> 3.       Why do we need the TypeIndex?
>>
>> ·         DISubprogram and DIVariable simply point to the DIType
>> metadata, instead of having an index into an array where these DIType are
>> stored?!
>>
>> 4.       Why the “TypeRecords” are of type MDString? Are they the source
>> name of the type?
>>
>>
>>
>> I believe that current Debug Info metadata contains all information
>> needed to create the codeview information in codegen.
>>
>> Thus, I do not see a need to either modify Clang or even modify the LLVM
>> IR.
>>
>>
>>
>> Please, if you have a concrete case where you think we have lost
>> information needed for codeview between Clang and Codegen, tell us about it
>> and I will be happy to help you figure out how to retrieve this information
>> from current DI metadata.
>>
>>
>>
>> Thanks,
>>
>> Amjad
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *David
>> Blaikie via llvm-dev
>> *Sent:* Thursday, March 03, 2016 20:26
>> *To:* Reid Kleckner <rnk at google.com>
>> *Cc:* llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] [cfe-dev] RFC: CodeView debug info emission in
>> Clang/LLVM
>>
>>
>>
>> I think it'd be reasonable to at least figure out a good way to do type
>> references consistently across the two schemes, but I'm OK with the idea of
>> having a blob of opaque type information for different debug info formats,
>> created by frontends (& don't mind if the library for building that blob
>> live in LLVM or Clang for now - the DWARF one at least would probably live
>> in LLVM because type info and other DWARF are described by similar/the same
>> constructs (DIEs, abbrevs, etc) - but it seems like that's not the case for
>> PDB, so there might not be any code to share between LLVM's CodeView needs
>> and the type info construction - then it's just a matter of whether pushing
>> that library down into LLVM for other frontends to use would be good, which
>> it probably will be at some point, so if it goes into Clang I'd at least
>> try to keep it pretty well separated)
>>
>> Potentially that consistency could be created by going the other way -
>> replace DITypeRef with an int, then have the retained types list be the
>> int->type mapping. Skipping the mangled names. (& skip the retained types
>> list for CV/PDB)
>>
>> - Dave
>>
>>
>>
>> On Wed, Mar 2, 2016 at 5:19 PM, Reid Kleckner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Circling back around 4 months later...
>>
>>
>>
>> I now believe that we should just let the frontend generate CV type info.
>> It's really not worth the hassle to try to have a common representation.
>> Enough C++ ABI-specific information leaks into the format that it's really
>> better to avoid trying to create a union of DWARF and CV type info in LLVM
>> DI metadata. We were able to reuse all the other non-type DI metadata, such
>> as location info and scope info, to emit inline line tables and variable
>> locations, so I think we did OK on reusing the existing infrastructure.
>> Compromising at not reusing the type representation seems OK.
>>
>>
>>
>> I haven't come up with any ideas better than the design that Dave
>> Bartolomeo outlined below, so I think we should go ahead with that. One
>> thing I considered was extending DITypeRef to be a union between MDString*,
>> DIType*, and a type index, but I think that's too invasive. I also don't
>> want to make a whole DIType heap allocation just to wrap a 32-bit type
>> index, so I'm in favor of putting the indices into DISubprogram and
>> DIVariable.
>>
>>
>>
>> Any thoughts on this plan?
>>
>>
>>
>> On Thu, Oct 29, 2015 at 10:11 AM, Dave Bartolomeo via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>> *Proposed Design*
>>
>> *How Debug Info is Generated*
>>
>> The CodeView type records for a compilation unit will be generated by the
>> front-end for the source language (Clang, in the case of C and C++). The
>> front-end has access to the full type system and AST of the language, which
>> is necessary to generate accurate debug type info. The type records will be
>> represented as metadata in the LLVM IR, similar to how DWARF debug info is
>> represented. I’ll cover the actual representation in a bit more detail
>> below.
>>
>> The LLVM back-end will be responsible for emitting the CodeView type
>> records from the IR into the output .obj file. Since the type records will
>> already be in the correct format, this is essentially just a copy. No
>> inspection of the type records is necessary within LLVM. The back-end will
>> also be responsible for generating CodeView symbol records, line numbers,
>> and source file info for any functions and data defined in the compilation
>> unit. The back-end is the logical place to do this because only the
>> back-end knows the code addresses, data addresses, and stack frame layouts.
>>
>>
>>
>> *Representation of CodeView in LLVM IR*
>>
>> DICompileUnit
>>
>> + e*xisting fields*
>>
>> + CodeViewTypes : DICodeViewTypes
>>
>>
>>
>> DICodeViewTypes
>>
>> + TypeRecords : MDString[]
>>
>> + UDTSymbols : DICodeViewUDT[]
>>
>>
>>
>> DICodeViewUDT
>>
>> + Name : MDString
>>
>> + TypeIndex : uint32_t
>>
>>
>>
>> DIVariable
>>
>> + *existing fields*
>>
>> + TypeIndex : uint32_t
>>
>>
>>
>> DISubprogram
>>
>> + *existing fields*
>>
>> + TypeIndex : uint32_t
>>
>> The existing DICompileUnit node will have a new operand named
>> CodeViewTypes, which points to the new DICodeViewTypes node that describes
>> the CodeView type information for the compilation unit.
>>
>>
>>
>> The DICodeViewTypes node contains two operands:
>>
>> -          TypeRecords, an array of MDStrings containing the actual
>> CodeView type records for the compilation unit, sorted in ascending order
>> of type index.
>>
>> -          UDTSymbols, and array of DICodeViewUDT nodes describing the
>> user-defined types (class/struct/union/enum) for which CodeView symbol
>> records will need to be emitted by the back-end.
>>
>>
>>
>> The DICodeViewUDT node contains two operands:
>>
>> -          Name, an MDString with the name of the symbol as it should
>> appear in the CodeView symbol record.
>>
>> -          TypeIndex, a uint32_t holding the CodeView type index of the
>> type record for the user-defined type’s definition.
>>
>>
>>
>> The DICodeViewUDT nodes are necessary because they are generally the only
>> references to the definition of the user-defined type. Other uses of that
>> type refer to the forward declaration record for the type, and without a
>> reference to the definition of the type, the linker will discard the
>> definition record when it merges the type information into the PDB.
>>
>>
>>
>> To specify the CodeView type for a variable or function, the DIVariable
>> and DISubprogram nodes will have an additional TypeIndex operand containing
>> the type index of the type record for that variable or function’s type.
>> This operand will be set to zero when CodeView debug info is not enabled.
>>
>>
>>
>> The above representation essentially extends the existing DWARF-focused
>> debug metadata to also include CodeView info. This was the least invasive
>> way I found to add CodeView support, but it may not be the right
>> architectural decision. It would also be possible to have the CodeView
>> metadata entirely separate from the DWARF metadata. This would reduce the
>> size of the IR when only one form of debug information was being emitted,
>> which is presumably the common case. However, I expect it would complicate
>> the scenario where both DWARF and CodeView are being emitted; for example,
>> would having two dbg.declare intrinsics for a single local variable confuse
>> existing consumers of LLVM IR? I’m hoping someone more familiar with the
>> existing debug info architecture can provide some guidance here if there’s
>> a better way of doing this.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/453a64f5/attachment.html>