[llvm-dev] [cfe-dev] RFC: CodeView debug info emission in Clang/LLVM

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Sat Oct 31 15:19:47 PDT 2015


On Sat, Oct 31, 2015 at 3:07 PM, Zachary Turner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Definitely having someone who knows both formats well would be an
> advantage.  Dave B might be in the best position to do this, so hopefully
> he can provide a couple more examples of areas where he has trouble
> expressing CV information entirely in the backend.
>
> Regardless of what everyone ends up deciding on with regards to the
> front-end / back-discussion, I want to suggest separating the work into
> separate pieces that can go in independently of each other.
>
> For example, the proposed LLVMCodeView library, which simply reads and
> writes raw CV records, seems to be orthogonal to this discussion and could
> be submitted independently.
>

I haven't looked at the patch in general, but that sounds quite plausible -
unit tests or what-have-you that demonstrate the expected behavior
regardless of wehre it ultimately ends up being used from (LLVM, Clang, or
both)


>
> On Sat, Oct 31, 2015 at 12:04 PM Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> The details of the mangling would be ABI dependent not debug-info-format
>> dependent.  Metadata already allows conveying a mangled name into LLVM, as
>> David Blaikie mentioned, so that's not really an issue. The frontend knows
>> how to construct the mangled name, the backend knows where the mangled name
>> goes in the final debug info.  It's a pretty reasonable separation of
>> concerns.
>>
>>
>>
>> I didn't see anything in this quickie overview of CodeView that wouldn't
>> be expressible in DWARF, so there's nothing (yet) persuasive to suggest
>> metadata should be format-aware.  It would be worthwhile for somebody
>> knowledgeable in one format to take a good detailed look at the other, just
>> to make sure; please provide a link to the detailed CodeView description
>> when it becomes available.
>>
>>
>>
>> Regarding source-language awareness of the debug-info generator, that's
>> really not a concern (and I say this as someone who once helped add DWARF
>> emission of COBOL-specific entries to a compiler backend that was not
>> entirely clear how to spell COBOL).  You need an API that is able to
>> specify the constructs used by the language, and the rest of it is just
>> processing those record types the way they're supposed to be.  The backend
>> is not doing any language-semantic analysis of the info, it's just doing
>> what it's told.
>>
>>
>>
>> Abstractly, the exercise of generalizing LLVM metadata to be able to
>> support more than one debug-info format feels like a good thing. Metadata
>> used to be more closely tied to DWARF (e.g., used DWARF tag codes directly
>> in the metadata nodes to identify things) but it has been evolving away
>> from that to a class hierarchy that is not so explicitly DWARF-ish.
>> Handling CodeView would encourage that direction, rather than being a more
>> fundamental shift.
>>
>> --paulr
>>
>>
>>
>> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of *David
>> Blaikie via cfe-dev
>> *Sent:* Friday, October 30, 2015 8:07 PM
>> *To:* Dave Bartolomeo
>> *Cc:* llvm-dev; Clang Dev
>> *Subject:* Re: [cfe-dev] [llvm-dev] RFC: CodeView debug info emission in
>> Clang/LLVM
>>
>>
>>
>> Brief answer, but can go into detail later:
>>
>> If this is the right idea, lets do it for dwarf too & generalize the
>> support to work for both. It's certainly something we've considered, to
>> save all the complexity of representing essentially static data in an
>> intermediate form.
>>
>> That said, given some of the stuff we have for lto, for example
>> (deallocating/merging types etc) I'm not sure that's obviously the right
>> strategy.
>>
>> Mangled names for types don't seem like a hugely difficult feature. We
>> already support mangled names for function debug info in dwarf. We already
>> have the mangled name of a type in the metadata, it could be used for
>> codeview emission.
>>
>> It might be worth talking more & considering what other language features
>> codeview uses that we haven't already plumbed through for dwarf (& dwarf
>> based debuggers use dwarf for expression evaluation too, fwiw)
>>
>> On Oct 30, 2015 5:12 PM, "Dave Bartolomeo via cfe-dev" <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>
>>
>>
>>
>> *From:* Saleem Abdulrasool [mailto:compnerd at compnerd.org]
>> *Sent:* Thursday, October 29, 2015 10:02 PM
>> *To:* Adrian Prantl <aprantl at apple.com>
>> *Cc:* Dave Bartolomeo <Dave.Bartolomeo at microsoft.com>;
>> llvm-dev at lists.llvm.org; cfe-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] RFC: CodeView debug info emission in Clang/LLVM
>>
>>
>>
>> On Thu, Oct 29, 2015 at 2:08 PM, Adrian Prantl via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>
>> > On Oct 29, 2015, at 10:11 AM, Dave Bartolomeo via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > Proposed Design
>> > How Debug Info is Generated
>> > The CodeView type records for a compilation unit will be generated by
>> the front-end for the source language (Clang, in the case of C and C++).
>> The front-end has access to the full type system and AST of the language,
>> which is necessary to generate accurate debug type info. The type records
>> will be represented as metadata in the LLVM IR, similar to how DWARF debug
>> info is represented. I’ll cover the actual representation in a bit more
>> detail below.
>> > The LLVM back-end will be responsible for emitting the CodeView type
>> records from the IR into the output .obj file. Since the type records will
>> already be in the correct format, this is essentially just a copy. No
>> inspection of the type records is necessary within LLVM. The back-end will
>> also be responsible for generating CodeView symbol records, line numbers,
>> and source file info for any functions and data defined in the compilation
>> unit. The back-end is the logical place to do this because only the
>> back-end knows the code addresses, data addresses, and stack frame layouts.
>>
>> Thanks for proposing this.
>>
>> How different are the type records from the type information we currently
>> have in LLVM's DIType hierarchy? Would it be feasible to move the logic for
>> generating type records from LLVM metadata into the backend? This way a
>> frontend could be agnostic about the debug information format.
>>
>>
>>
>> I think that this really is the path we want to follow.  If the current
>> metadata we emit is insufficient, we should augment it with additional
>> information sufficient to generate the necessary data in the backend.  The
>> same annotations would then be able able to generate one OR both debug info
>> formats.
>>
>>
>>
>> *[dB] I considered that approach, but I see a few reasons why I don’t
>> think making the debug metadata format agnostic would work out very well.
>> To ensure that the backed can generate both debug formats by itself, we
>> need to make the metadata contain enough information from the original AST
>> for the format-specific code in the backend to generate the debug info. I
>> believe that in practice, we’d wind up having to encode a significant
>> portion of the AST (for decls of types and members, at least) into
>> metadata, because debug type info, at least in CodeView, strives for pretty
>> close fidelity with the declarations and types in the original source
>> language. The CodeView debug type info is used by the VS debugger to parse
>> and evaluate C++ expressions while debugging. We currently have a bunch of
>> limitations in our debugger’s expression evaluation due to information
>> missing from the debug type info, and we’ll probably attempt to preserve
>> even more of that information going forward. There’s not much information
>> from the AST that we can ignore if we want to reach that goal. Of course,
>> we could just accept that we need the majority of the AST for type and
>> function declarations in the debug metadata, and do that work in order to
>> avoid having the frontend know about debug info formats, but that just
>> means that now the backend code that generates the debug info has to know
>> about all of the source language-specific constructs that it’s reading when
>> creating the debug info. I think I’d rather have Clang have to understand
>> the language-specific parts of multiple debug info formats than have LLVM
>> understand language-specific metadata.*
>>
>>
>>
>> *As an example, the CodeView definition of a user-defined type requires
>> both the mangled name of the type and the non-mangled “display name” of the
>> type. Both of these require a fair bit of information from the AST to
>> generate. For the mangled name in particular, there’s already code in Clang
>> that generates this. If we want the backend to do this instead, we have to
>> stuff a bunch of AST info into metadata, and then figure out how to share
>> the name mangling code between Clang (where it operates on actual ASTs) and
>> LLVM (where it would operate on metadata). If, instead, we have Clang
>> compute the mangled name and display name and pass those names in the
>> metadata, we’re not being particularly format-agnostic in Clang, and if the
>> current compilation is only generating DWARF, we didn’t really need to
>> compute or store those potentially large strings for every type anyway.*
>>
>>
>>
>> *Whether Clang is format-agnostic or not, there will have to be some
>> component that converts from something format-agnostic (either ASTs or
>> metadata) to DWARF, and some component that converts from ASTs or metadata
>> to CodeView. You can put those two components in Clang and accept that
>> Clang won’t be format-agnostic. Or, you can put those two components in
>> LLVM, which leaves Clang as format-agnostic but requires that LLVM be more
>> source language-aware. It also requires a third component to translate ASTs
>> into metadata to pass to LLVM. Letting Clang worry about two different
>> debug type info formats seems preferable to writing additional code and
>> making an LLVM component understand more about the source language.*
>>
>>
>>
>> *Is there another approach I haven’t thought of that would let us wind up
>> with a cleaner solution? I’ve only been working with the Clang and LLVM
>> debug info code for a few months, so my knowledge of the existing design is
>> far from complete.*
>>
>>
>>
>> *Note that for the rest of debug info (line numbers, source files, stack
>> layouts, etc.), I don’t think the frontend should have to worry about the
>> debug info format, and the current design for those pieces is just fine.
>> It’s only the type info that I think is source language-specific enough to
>> justify computing it in the frontend.*
>>
>>
>>
>>
>>
>>
>>
>> -- adrian
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev&data=01%7c01%7cDave.Bartolomeo%40microsoft.com%7cd240c2d59d2a4a17baf508d2e0e74c79%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Fz6kMmt9LpwG7SvGYKLA4g3%2fYaBWp0AAhFsJKkZQARE%3d>
>>
>>
>>
>>
>>
>> --
>>
>> Saleem Abdulrasool
>> compnerd (at) compnerd (dot) org
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151031/69355bcb/attachment.html>


More information about the llvm-dev mailing list