[llvm-dev] [RFC] Introduce Dump Accumulator

Wed Aug 5 15:51:37 PDT 2020

I’m not a fan of keeping important data outside the IR in an analysis.  If we’re planning to emit it, it should be represented directly in the IR.  Is there some reason we can’t just stick the data in a global variable?

I’m not sure it’s helpful to have a generic mechanism for this; it’s not clear how this would work if multiple different features were trying to emit data into the llvm_dump section at the same time.

-Eli

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Kazu Hirata via llvm-dev
Sent: Wednesday, August 5, 2020 3:36 PM
To: llvm-dev at lists.llvm.org; Mircea Trofin <mtrofin at google.com>; Wei Mi <wmi at google.com>; james.henderson at sony.com
Subject: [EXT] [llvm-dev] [RFC] Introduce Dump Accumulator

Introduction

============

This RFC proposes a mechanism to dump arbitrary messages into object

files during compilation and retrieve them from the final executable.

Background

==========

We often need to collect information from all object files of

applications.  For example:

- Mircea Trofin needs to collect information from the function

  inlining pass so that he can train the machine learning model with

  the information.

- I sometimes need to dump messages from optimization passes to see

  where and how they trigger.

Now, this process becomes challenging when we build large applications

with a build system that caches and distributes compilation jobs.  If

we were to dump messages to stderr, we would have to be careful not to

interleave messages from multiple object files.  If we were to modify

a source file, we would have to flush the cache and rebuild the entire

application to collect dump messages from all relevant object files.

High Level Design

=================

- LLVM: We provide machinery for individual passes to dump arbitrary

  messages into a special ELF section in a compressed manner.

- Linker: We simply concatenate the contents of the special ELF

  section.  No change is needed.

- llvm-readobj: We add an option to retrieve the contents of the

  special ELF section.

Detailed Design

===============

DumpAccumulator analysis pass

-----------------------------

We create a new analysis pass called DumpAccumulator.  We add the

analysis pass right at the beginning of the pass pipeline.  The new

analysis pass holds the dump messages throughout the pass pipeline.

If you would like to dump messages from some pass, you would obtain

the result of DumpAccumulator in the pass:

  DumpAccumulator::Result *DAR = MAMProxy.getCachedResult<DumpAccumulator>(M);

Then dump messages:

  if (DAR) {

    DAR->Message += "Processing ";

    DAR->Message += F.getName();

    DAR->Message += "\n";

  }

AsmPrinter

----------

We dump the messages from DumpAccumulator into a section called

".llvm_dump" in a compressed manner.  Specifically, the section

contains:

- LEB128 encoding of the original size in bytes

- LEB128 encoding of the compressed size in bytes

- the message compressed by zlib::compressed

in that order.

llvm-readobj

------------

We read the .llvm_dump section.  We dump each chunk of compressed data

one after another.

Existing Implementation

=======================

https://reviews.llvm.org/D84473

Future Directions

=================

The proposal above does not support the ThinLTO build flow.  To

support that, I am thinking about putting the message as metadata in

the IR at the prelink stage.

Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200805/36f4bf46/attachment.html>