[llvm-dev] [RFC] Introduce Dump Accumulator

Wed Aug 5 16:16:40 PDT 2020

On Wed, Aug 5, 2020 at 3:51 PM Eli Friedman <efriedma at quicinc.com> wrote:

> I’m not a fan of keeping important data outside the IR in an analysis.  If
> we’re planning to emit it, it should be represented directly in the IR.  Is
> there some reason we can’t just stick the data in a global variable?
>

The analysis in the scenarios here is external to LLVM - ML training, for
example. It's really a way to do printf, but where the data could be large
(challenging in a distributed build env, where IO may be throttled), or
non-textual (for instance, capture IR right before a pass). An alternative
would be to produce a side-file, but then (again, distributed build), you
have to collect those files and concatenate them, and modify the build
system to be aware of all that.

>
> I’m not sure it’s helpful to have a generic mechanism for this; it’s not
> clear how this would work if multiple different features were trying to
> emit data into the llvm_dump section at the same time.
>

You could layer the approach: the one llvm_dump section has a pluggable
reader.

>
>
> -Eli
>
>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *Kazu
> Hirata via llvm-dev
> *Sent:* Wednesday, August 5, 2020 3:36 PM
> *To:* llvm-dev at lists.llvm.org; Mircea Trofin <mtrofin at google.com>; Wei Mi
> <wmi at google.com>; james.henderson at sony.com
> *Subject:* [EXT] [llvm-dev] [RFC] Introduce Dump Accumulator
>
>
>
> Introduction
>
> ============
>
>
>
> This RFC proposes a mechanism to dump arbitrary messages into object
>
> files during compilation and retrieve them from the final executable.
>
>
>
> Background
>
> ==========
>
>
>
> We often need to collect information from all object files of
>
> applications.  For example:
>
>
>
> - Mircea Trofin needs to collect information from the function
>
>   inlining pass so that he can train the machine learning model with
>
>   the information.
>
>
>
> - I sometimes need to dump messages from optimization passes to see
>
>   where and how they trigger.
>
>
>
> Now, this process becomes challenging when we build large applications
>
> with a build system that caches and distributes compilation jobs.  If
>
> we were to dump messages to stderr, we would have to be careful not to
>
> interleave messages from multiple object files.  If we were to modify
>
> a source file, we would have to flush the cache and rebuild the entire
>
> application to collect dump messages from all relevant object files.
>
>
>
> High Level Design
>
> =================
>
>
>
> - LLVM: We provide machinery for individual passes to dump arbitrary
>
>   messages into a special ELF section in a compressed manner.
>
>
>
> - Linker: We simply concatenate the contents of the special ELF
>
>   section.  No change is needed.
>
>
>
> - llvm-readobj: We add an option to retrieve the contents of the
>
>   special ELF section.
>
>
>
> Detailed Design
>
> ===============
>
>
>
> DumpAccumulator analysis pass
>
> -----------------------------
>
>
>
> We create a new analysis pass called DumpAccumulator.  We add the
>
> analysis pass right at the beginning of the pass pipeline.  The new
>
> analysis pass holds the dump messages throughout the pass pipeline.
>
>
>
> If you would like to dump messages from some pass, you would obtain
>
> the result of DumpAccumulator in the pass:
>
>
>
>   DumpAccumulator::Result *DAR = MAMProxy.getCachedResult<DumpAccumulator>(M);
>
>
>
> Then dump messages:
>
>
>
>   if (DAR) {
>
>     DAR->Message += "Processing ";
>
>     DAR->Message += F.getName();
>
>     DAR->Message += "\n";
>
>   }
>
>
>
> AsmPrinter
>
> ----------
>
>
>
> We dump the messages from DumpAccumulator into a section called
>
> ".llvm_dump" in a compressed manner.  Specifically, the section
>
> contains:
>
>
>
> - LEB128 encoding of the original size in bytes
>
> - LEB128 encoding of the compressed size in bytes
>
> - the message compressed by zlib::compressed
>
>
>
> in that order.
>
>
>
> llvm-readobj
>
> ------------
>
>
>
> We read the .llvm_dump section.  We dump each chunk of compressed data
>
> one after another.
>
>
>
> Existing Implementation
>
> =======================
>
>
>
> https://reviews.llvm.org/D84473
>
>
>
> Future Directions
>
> =================
>
>
>
> The proposal above does not support the ThinLTO build flow.  To
>
> support that, I am thinking about putting the message as metadata in
>
> the IR at the prelink stage.
>
>
>
> Thoughts?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200805/b0e3a8dd/attachment.html>