[llvm-dev] [XRay][RFC] Tooling for XRay Trace Analysis

Tue Aug 23 01:05:08 PDT 2016

Hi llvm-dev,

I've been implementing a tool for analysing XRay traces. A recap of XRay's original RFC [0] mentions a tool that does function call accounting as a starting point. This is implemented currently in D21987 [1], and is being reviewed by David Blaikie.

One key issue in that review is the dependency between the log format determined by the XRay runtime implementation in compiler-rt [2] and the tool reading these log entries.

While it seems obvious that we should document clearly the file format of the traces (even supporting different versions) there's a clear dependency between the writer (XRay in compiler-rt) and the reader (the tool under development in LLVM). In this RFC, I'd like to explore some options regarding the coordination of these two moving pieces located in two places -- in particular, compiler-rt and the LLVM tools.

# Problem Statement & Background

XRay traces are only as useful as the analysis you can perform on it. While it's great to be able to look at stack traces, sometimes basic statistics and summaries are more digestible and gives a more immediate picture of the operations performed by one run of a particular binary (or multiple runs of the same binary on different inputs). Recently I've shared some initial results of the analysis available in [1] on an instrumented build of Clang [3] -- and this is just one example of the kinds of analysis possible with the data. However, there's one wrinkle here:

  The analysis should be developed independently of the logging implementation.

There's many reasons for doing this, and while it's certainly possible to implement a custom logging handler for XRay-instrumented binaries to generate some statistics on the fly instead of logging the function calls, this increases the cost and friction of getting value out of using XRay.

Given this constraint, here are a few problems:

  - The runtime library and the tools reading the log should have a common understanding of the log format. For now we use a naive binary dump log file format. We understand that there are platform and encoding issues that come with this (endianness being one of them, size of fields being another across platforms) but that this could be mitigated with enough metadata in the beginning the of files to indicate these encoding issues in a portable manner. Still, this is not easy, and having more complex schemes impose a heavier cost to the runtime implementation.
  - The analysis tools should be able to read different executable file formats -- currently we only support ELF 64-bit. Since some analysis tools would really be great if they knew to convert function id's generated by the XRay runtime, having the instrumentation maps from the executables instrumented with XRay goes a long way to converting function id's to even just function pointers, and eventually to de-mangled function names. This means the tool will have to support multiple file formats that XRay-instrumented binaries ought to be ported to (COFF, MachO, and ELF).
  - Having the analysis work on common in-memory (or on-disk) data structures ensures maximum applicability. This means even if the log file format changes, the analysis should still be able to work as long as we're keeping at least the same information in the log required by the analyses. For example, a hypothetical tool for generating just a graph of function calls encountered in a trace with counts ought to be feasible without being tied to the format of the XRay trace being fed to the tool.

More concisely:

  1. We ought to be able to share log writer/reader code between LLVM and compiler-rt.
  2. Converting the trace format from platform-specific to platform-agnostic (and vice versa) ought to be possible.
  3. The tooling ought to be extensible with more analysis implementations without being tied to the log format.

# Proposed Solution

In [1] I've gone ahead and implemented a tool, currently named 'llvm-xray' which supports sub-commands to do the following:

  - `llvm-xray extract <xray-instrumented binary>` : Converts the xray_instr_map in the binary into something more human and machine-readable text (currently does JSON, but I understand YAML is already supported better in LLVM).
  - `llvm-xray account <xray trace> -m <xray-instrumented binary>` : Performs function call accounting with basic statistics.

In the near future, we're looking to extend this tool to have the following (and similar) functionality:

  - `llvm-xray dump <xray trace> -format={yaml,json,...}` : Takes an xray trace and turns it into some human-readable text format.
  - `llvm-xray ingest <xray trace> -input-format={yaml,json...}` : Takes an xray trace in some human-readable text format, turns it into the binary format.
  - `llvm-xray stack <xray trace> -input-format=... -format=...` : Recreates stack traces from an xray trace.
  - `llvm-xray graph <xray trace> -input-format=...` : Creates a graph (in dot format) of the function call interactions from the trace file.

This allows us to do a few things:

  1. When testing xray in compiler-rt, use the "dump" tool to inspect the contents of the log generated from xray-instrumented binaries. Similarly be able to synthesise xray binary traces in llvm lit tests using "ingest".
  2. Extend the tool with more functionality without having to be gated on the definition of and/or implementation of the trace format. Since we can define the reader and writer implementation in one place, we can use the tool to enforce the format in regression tests (and as we evolve the format further, be able to support backward compatibility).

# Proposed Plan of Action

If the proposed solution is acceptable, the proposed plan of action is as follows (in chronological order):

  0. Break up [1] into smaller pieces, starting with the base llvm-xray tool that literally "does nothing".
  1. Implement the 'dump' and 'ingest' sub-commands as a single patch, with defined tests.
  2. Update the logging implementation in [2] to use the 'dump' sub-command to test that entries in the log are what we expect them.
  3. Implement the 'account' sub-command with tests seeded with data in lit tests.
  4. Implement the 'stack' sub-command with tests seeded with data similar to #3.
  5. Implement the 'graph' sub-command similar to #3.

Note that we do not actually solve the issue of sharing the log writer/reader code between LLVM and compiler-rt directly, but rather we sidestep this in the meantime using the tool.

# Open Questions

- Is it possible to define the writer code in LLVM and have the compiler-rt implementation depend on it? I hear that this is going to be useful for something like the profiling library in compiler-rt too, so that the readers and writer implementations are both in LLVM. What are the technical roadblocks there, and in your opinion is this something worth fixing/enabling?

- What is the preferred human-readable text file format to support in LLVM? I understand that there's already code to support parsing YAML, so this might be an obvious choice. OTOH JSON is really popular and there are a lot of parsers in other languages that can already deal with this file format. I'm happy to support both but was wondering whether there was a preference for YAML aside for the reason I already cite?

- This proposal only talks of the tool itself, but the implementation of the tool involves some moving parts that are worth implementing as libraries and tested in isolation (or in combinations, some mocked and faked, etc). I'm a fan of writing unit tests for these things but I don't see a unittests/tools directory for these tool-specific internals testing. Is this something worth having? Any pointers on how to proceed with this unit-testing of tool-specific internals?

Cheers

-- Dean

[0] http://lists.llvm.org/pipermail/llvm-dev/2016-April/098901.html
[1] https://reviews.llvm.org/D21987
[2] https://reviews.llvm.org/D21982
[3] http://lists.llvm.org/pipermail/llvm-dev/2016-July/102552.html