[llvm-dev] [RFC] Refactor llvm-dwp in to a library.

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 22 17:43:27 PDT 2021

On Tue, Jun 22, 2021 at 5:36 PM Alexander Yermolovich <ayermolo at fb.com>

> Hello David,
> Thank you for elaborating.
> When you are talking about compression, is this related to debug info
> coming in compressed already, or something else?

Both/either compressed input (which if I remember correctly gets fully
decompressed into a buffer and that buffer may be kept alive for most/all
of the llvm-dwp run - I don't remember exactly how that goes) and
compressed output (which first llvm-dwp passes bytes to MCStreamer, which
buffers every section itself, then it has to write that section contents
into a compressed buffer which also stays alive until it's written out to
the output file, etc).

> Regarding MCStremer what would be the alternative? In Bolt it provides a
> nice level of abstraction for us as we output new updated binary, and write
> out dwo files, in debug fission case.

Some kind of lighter weight abstraction (or refactoring of MCStreamer to
make or allow it to be lighter weight in some/many cases - eg: if a user
knows the important facts (size, etc) that are needed to emit headers,
layout the object, etc then get a callback to emit the section's bytes when
needed - so if they're parametrically computed (eg: a function of some
input file, bytes from a StringMap, etc) or simply mapped from some input
file, then those bytes can be emitted without MCStreamer to ever have to
take ownership of the byte buffer/make its own).

Personally I'd love it if the abstraction was lightweight enough to be
shared with lld and orc (well, I guess orc probably already uses
MCStreamer, but perhaps it can benefit from these reductions in overhead).

> In general, the usage model for BOLT is in some ways similar to llvm-dwp,
> except we don't really deal with compressed debug information. Some
> sections are pass through, but others get either modified, .debug_info, or
> complete re-written, .debug_loc. As an example. For llvm-dwp the
> .debug-str-offset and .debug-str section gets re-written. Although much
> more data is modified/replaced before being written out in bolt case. So, I
> am not sure pure in/out performance is as critical for us at the moment.

Fair enough.

> I took initial step of factoring out llvm-dwp code in to it's own library.
> To see what it will look like. What I ended up is with few APIs that take
> in MCStreamer, and all the code for dealing with it is in main function of
> llvm-dwp.
> With all of this said, and Bolt usage model, I think dealing with
> MCStreamer issue can be deferred to after refactoring to library/adding
> functionality to BOLT.
> Alex
> ------------------------------
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Monday, June 21, 2021 6:41 PM
> *To:* Alexander Yermolovich <ayermolo at fb.com>
> *Cc:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Maksim Panchenko
> <maks at fb.com>
> *Subject:* Re: [RFC] Refactor llvm-dwp in to a library.
> On Mon, Jun 21, 2021 at 6:28 PM Alexander Yermolovich <ayermolo at fb.com>
> wrote:
> Hello David
> I haven't dug into llvm-dwp performance. What are some of the performance
> pain points that you know about?
> Yeah - using LLVM's higher level abstractions for writing object files
> (MCStreamer et, al) means that, as far as I recall, all the output ends up
> buffered in memory before being written out - whereas, ideally, it'd be
> streamed (memcpy to/from memory mapped files) from input file to output
> file (potentially through streamed compression/decompression where possible
> too - another layer of the MCStreamer abstractions that can add cost
> (though I don't think I implemented support for compressing output in
> llvm-dwp, though it'd be trivial to add because it's already supported in
> MCStreamer (but that support does buffer the whole uncompressed and
> compressed data... ))). Maybe some other things, but that's certainly the
> top of my list.
> - Dave
> Thank You
> Alex
> ------------------------------
> *From:* Alexander Yermolovich
> *Sent:* Monday, June 21, 2021 6:11 PM
> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Cc:* dblaikie at gmail.com <dblaikie at gmail.com>; Maksim Panchenko <
> maks at fb.com>
> *Subject:* [RFC] Refactor llvm-dwp in to a library.
> Hello
> I am working on adding support for bolt (
> https://github.com/facebookincubator/BOLT/tree/rebased) to write out DWP
> directly.  I want to re-use as much llvm-dwp functionality as possible.
> Plan is to move most of functionality that is now in llvm-dwp in to
> llvm/lib/DWP, with corresponding header file in llvm/include/llvm/DWP.
> In the header files have
> getContributionIndex
> handleSection
> parseCompileUnitHeader
> writeStringsAndOffsets
> getCUIdentifiers
> buildDuplicateError
> writeIndex
> For structs that are passed around define in the header also.
> UnitIndexEntry
> CompileUnitHeader
> CompileUnitIdentifiers
> Thought I would solicit opinions before I dive too deep into this.
> Thank You
> Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/a231c600/attachment.html>

More information about the llvm-dev mailing list