[llvm-dev] RFC: XRay in the LLVM Library
Dean Michael Berris via llvm-dev
llvm-dev at lists.llvm.org
Tue Nov 29 21:08:30 PST 2016
Recently, we've committed the beginnings of the llvm-xray  tool which allows for conveniently working with both XRay-instrumented libraries as well as XRay trace/log files. In the course of the review for the conversion tool  which turns a binary/raw XRay log file into YAML for human consumption purposes, a question arose as to how we intend to allow users to develop tools that deal with XRay traces (and the instrumentation maps in binaries).
As a bit of background, I've been working on the "flight data recorder" mode  for the XRay runtime library -- this mode lets the XRay instrumented binary to continuously write trace entries into an in-memory log, which is kept as a circular buffer of buffers . FDR mode writes more concise records and has a different log format than the current "naive" logging implementation in compiler-rt (which continuously writes to disk as soon as thread-local buffers are full).
# Problem Statement
XRay has two key pieces of information that need to be encoded in a consistent manner: the instrumentation map embedded in binaries and the xray log files. However, we run into some issues when we change the encoding of this information over time either adding or removing information. This situation is very similar to how LLVM handles backwards compatibility with the bitcode format / versioning.
The problem we have is how to ensure that as we make changes to the data being output by the runtime library, that the tools handling this data are able to read them. A lot of factors play into this, which may be solved in many different ways (but is not the crux of this RFC):
- The split between the LLVM "core" library/tools and compiler-rt. This means we implement the writer in compiler-rt but implement the tools reading the traces in LLVM. We also have to coordinate any changes in LLVM for encoding new information in to the instrumentation map so that compiler-rt can take advantage of this new information.
- The potential for allowing user-defined additional information embedded in the XRay traces. We have ongoing projects that will add things like argument logging, and custom data logging, which will add information to the log without necessarily changing the "format" of the data.
# Potential Resolutions
Given the state we're at in XRay's development, we're looking at a few ways of going about the backwards/forwards compatibility of the instrumentation map and the xray log files, and the tools that will be written to read/manipulate them. We're seeking feedback on the following options and alternatives we may not have considered.
## Option A: Expose a Library that supports all known formats.
We can move out some currently tool-specific code for `llvm-xray extract`  that is able to ingest a binary with XRay instrumentation as something in (strawman proposal) lib/XRay (i.e. include/llvm/XRay/..., and implementation in lib/XRay/...), so that the tools become a thin wrapper around the functionality in this library. We can apply this to the `llvm-xray convert` core logic as well, to allow for loading all known/supported formats for the log file.
This option gives us the capability to provide a set of canonical implementations that can handle a set of file formats. This might introduce some complexity in parsing lots of known/supported formats (like YAML, compiler-emitted instrumentation maps for x86_64/arm7/aarch64/<insert platforms where XRay is yet to be ported>) in a library that not all tool writers actually need.
This option follows closely what the LLVM project does with backwards compatibility for parsing LLVM IR, applied to XRay instrumentation maps and traces.
## Option B: Expose a library that only supports one canonical format.
We can keep tool-specific code alongside the tools, but define one canonical format for the instrumentation map and traces -- as a specification document and a library implementation. This canonical format could be what we already have today which will make the log reading and instrumentation map handling library simple, and evolves only in case we extend/change the canonical format.
This means in the case for FDR mode traces, we'll have the conversion tool know about the FDR mode trace format/encoding and have a transformation from that to the canonical format. This means that the transformation logic will be localised to the conversion tool, while any other tool that builds upon and uses the reader library will not need to change. This also provides options for users defining their own log formats using the XRay library interfaces to install their own handlers to implement the transformations from their format to the XRay-canonical format in the tool without being tied to maintaining the released library version.
The evolution of the canonical format can happen more slowly and more conservatively than when new implementations of the XRay runtime is made available through compiler-rt.
# Open Questions
Some burning questions we'd like to get some thoughts on:
- Is there a preference between the two options provided above?
- Any other alternatives we should consider?
- Which parts of which options do you prefer, and is there a synthesis of either of those options that appeals to you?
Thanks in advance!
 - `llvm-xray extract` defined in https://reviews.llvm.org/D21987
 - `llvm-xray convert` being reviewed in https://reviews.llvm.org/D24376
 - FDR mode ongoing implementation (work in progress) at https://reviews.llvm.org/D27038
 - Buffer Queue implementation (work in progress) at https://reviews.llvm.org/D26232
More information about the llvm-dev