[llvm-dev] [XRay][RFC] Tooling for XRay Trace Analysis

Thu Sep 8 19:35:20 PDT 2016

> On 7 Sep 2016, at 01:21, David Blaikie <dblaikie at gmail.com> wrote:
> 
> (sorry for the delay)
> 

All good, thanks Dave!

> On Tue, Aug 23, 2016 at 1:05 AM Dean Michael Berris <dean.berris at gmail.com <mailto:dean.berris at gmail.com>> wrote:
> Hi llvm-dev,
> 
> I've been implementing a tool for analysing XRay traces. A recap of XRay's original RFC [0] mentions a tool that does function call accounting as a starting point. This is implemented currently in D21987 [1], and is being reviewed by David Blaikie.
> 
> One key issue in that review is the dependency between the log format determined by the XRay runtime implementation in compiler-rt [2] and the tool reading these log entries.
> 
> While it seems obvious that we should document clearly the file format of the traces (even supporting different versions) there's a clear dependency between the writer (XRay in compiler-rt) and the reader (the tool under development in LLVM). In this RFC, I'd like to explore some options regarding the coordination of these two moving pieces located in two places -- in particular, compiler-rt and the LLVM tools.
> 
> # Problem Statement & Background
> 
> XRay traces are only as useful as the analysis you can perform on it. While it's great to be able to look at stack traces, sometimes basic statistics and summaries are more digestible and gives a more immediate picture of the operations performed by one run of a particular binary (or multiple runs of the same binary on different inputs). Recently I've shared some initial results of the analysis available in [1] on an instrumented build of Clang [3] -- and this is just one example of the kinds of analysis possible with the data. However, there's one wrinkle here:
> 
>   The analysis should be developed independently of the logging implementation.
> 
> There's many reasons for doing this, and while it's certainly possible to implement a custom logging handler for XRay-instrumented binaries to generate some statistics on the fly instead of logging the function calls, this increases the cost and friction of getting value out of using XRay.
> 
> Given this constraint, here are a few problems:
> 
>   - The runtime library and the tools reading the log should have a common understanding of the log format. For now we use a naive binary dump log file format. We understand that there are platform and encoding issues that come with this (endianness being one of them, size of fields being another across platforms) but that this could be mitigated with enough metadata in the beginning the of files to indicate these encoding issues in a portable manner. Still, this is not easy, and having more complex schemes impose a heavier cost to the runtime implementation.
>   - The analysis tools should be able to read different executable file formats -- currently we only support ELF 64-bit. Since some analysis tools would really be great if they knew to convert function id's generated by the XRay runtime, having the instrumentation maps from the executables instrumented with XRay goes a long way to converting function id's to even just function pointers, and eventually to de-mangled function names. This means the tool will have to support multiple file formats that XRay-instrumented binaries ought to be ported to (COFF, MachO, and ELF).
>   - Having the analysis work on common in-memory (or on-disk) data structures ensures maximum applicability. This means even if the log file format changes, the analysis should still be able to work as long as we're keeping at least the same information in the log required by the analyses. For example, a hypothetical tool for generating just a graph of function calls encountered in a trace with counts ought to be feasible without being tied to the format of the XRay trace being fed to the tool.
> 
> This last requirement is a bit that I'm slightly confused by/trying to better understand. I could picture tools as taking a dependency on some LLVM API for reading the original, platform-specific, binary format. This would make the tool neutral to versioning and target.
> 
> But I take it you mean (as detailed later) to have a separate format (could be a portable binary format, but currently discussing it as JSON/YAML/etc) that things are converted from that makes them portable?
> 
> One of the reasons, I think you mentioned, is that while the log is already a separate file, you really want the instrumentation map along with it, and that's in the whole binary which you probably don't need. Am I following correctly?

So there are two pieces here:

1) The instrumentation map that's in the binary.
2) The log file written out by an XRay-instrumented binary.

For 1, we for certain should (and already) rely on the LLVM libraries for dealing with all supported binary file formats.

For 2, we are currently using a very simple "naive" format which is just a binary dump of data in memory onto-disk. The main driver for this is the minimal cost in space and time in the XRay runtime.

We'd like to minimise processing the in-memory data from the XRay runtime perspective, so we do the simplest thing that could possibly work here and write out fixed-sized records into a binary file. In the future, we might have different log structures (non-fixed sized records as described in the whitepaper, records with different types, etc.) but those would still need to be minimal cost to store.

This complicates the tool slightly, because it has to support all these different log formats that are first-class supported by the XRay runtime. Whether the instrumentation map is available is a separate issue -- it  should be possible to just take an XRay log/trace and run analysis against just the log file, and not get the translation from function id to function names and debugging info.

This binary format log will be platform specific due to endianness and sizes of the fields/records, and I'm saying the tool should be able to handle these log files too. Of course this means we indicate these details in the log file itself which would be a little challenging if they stay in binary form.

> 
> Should this extraction then be an extract and merge? (creating a file containing a log and instrumentation map together in this generic format?)

It could be, but really doesn't need to be -- since the intent is the tools should be able to work on the log file without the instrumentation map at all, albeit at reduced functionality (i.e. we can't get function names and debug info, but still be able to get function ids).

>  
> 
> More concisely:
> 
>   1. We ought to be able to share log writer/reader code between LLVM and compiler-rt.
>   2. Converting the trace format from platform-specific to platform-agnostic (and vice versa) ought to be possible.
>   3. The tooling ought to be extensible with more analysis implementations without being tied to the log format.
> 
> # Proposed Solution
> 
> In [1] I've gone ahead and implemented a tool, currently named 'llvm-xray' which supports sub-commands to do the following:
> 
>   - `llvm-xray extract <xray-instrumented binary>` : Converts the xray_instr_map in the binary into something more human and machine-readable text (currently does JSON, but I understand YAML is already supported better in LLVM).
>   - `llvm-xray account <xray trace> -m <xray-instrumented binary>` : Performs function call accounting with basic statistics.
> 
> In the near future, we're looking to extend this tool to have the following (and similar) functionality:
> 
>   - `llvm-xray dump <xray trace> -format={yaml,json,...}` : Takes an xray trace and turns it into some human-readable text format.
>   - `llvm-xray ingest <xray trace> -input-format={yaml,json...}` : Takes an xray trace in some human-readable text format, turns it into the binary format.
> 
> What's the need for this direction? Only for LLVM test purposes? Other reasons?
> 

Mostly for testing purposes, and for "portability". If for example we'd like to share these traces around in a neutral human-readable format (for whatever reason), then having it in these text formats is much more convenient to inspect and reason about (and snip in emails, serve on web pages, be searchable, etc.).

> (for DWARF for example, we just generate DWARF from existing code and test that, rather than having a separate/independent format for generating DWARF more directly - but we don't have complicated DWARF tools (we have llvm-dwp which is as close as we get, and in that case I just checked in binary object files along with the source used to create them)
> 
> This is certainly an area of discussion - with tools like lld taking a few different approaches (including a YAML format for specifying object files, or using assembly files and just assembling them on the fly in test cases, or checking in binary object files). So there's no clear pre-existing answer in LLVM for this situation, for sure)

I agree -- it's the pioneer's curse I guess, being there first is both good and bad. :)

That said, I think this ability to round-trip is an important one in principle. And in the case of the XRay tooling, it seems it's essential to ensure that we're able to write tests and inspect the binary traces in a human-readable format.

>  
>   - `llvm-xray stack <xray trace> -input-format=... -format=...` : Recreates stack traces from an xray trace.
>   - `llvm-xray graph <xray trace> -input-format=...` : Creates a graph (in dot format) of the function call interactions from the trace file.
> 
> This allows us to do a few things:
> 
>   1. When testing xray in compiler-rt, use the "dump" tool to inspect the contents of the log generated from xray-instrumented binaries. 
> 
> Might be worth considering whether dumping for testability should be YAML/JSON or something else. (the DWARF and object dumping used in LLVM isn't in any such format - it's just a format designed for humans which works well enough for our FileCheck testing, etc)
> 
> But if we need a format change to feed it in to other tools, then yes - testing on that format (rather than having a JSON/YAML then a separate dumping format) makes sense. I'm just trying to separate out the different requirements and what implications they have on the design, etc.
>  

That's an interesting thought. I was trying to avoid defining yet another text format because there's already seemingly mature support for YAML I/O in LLVM. JSON is just one of those other formats that's useful for interchange in this web-connected world, and is really a "nice to have" and people have expressed as a format that's good to use.

If I was going to trim this down, I'd settle just for YAML for the purposes of human consumption.

If push came to shove, a custom text format that's unique for the XRay requirements.

If at all possible, I'd like to just use the available libraries in LLVM. :)

> Similarly be able to synthesise xray binary traces in llvm lit tests using "ingest".
>   2. Extend the tool with more functionality without having to be gated on the definition of and/or implementation of the trace format. Since we can define the reader and writer implementation in one place, we can use the tool to enforce the format in regression tests (and as we evolve the format further, be able to support backward compatibility).
> 
> # Proposed Plan of Action
> 
> If the proposed solution is acceptable, the proposed plan of action is as follows (in chronological order):
> 
>   0. Break up [1] into smaller pieces, starting with the base llvm-xray tool that literally "does nothing".
>   1. Implement the 'dump' and 'ingest' sub-commands as a single patch, with defined tests.
>   2. Update the logging implementation in [2] to use the 'dump' sub-command to test that entries in the log are what we expect them.
>   3. Implement the 'account' sub-command with tests seeded with data in lit tests.
>   4. Implement the 'stack' sub-command with tests seeded with data similar to #3.
>   5. Implement the 'graph' sub-command similar to #3.
> 
> Note that we do not actually solve the issue of sharing the log writer/reader code between LLVM and compiler-rt directly, but rather we sidestep this in the meantime using the tool.
> 
> # Open Questions
> 
> - Is it possible to define the writer code in LLVM and have the compiler-rt implementation depend on it? I hear that this is going to be useful for something like the profiling library in compiler-rt too, so that the readers and writer implementations are both in LLVM. What are the technical roadblocks there, and in your opinion is this something worth fixing/enabling?
> 
> Sounds like other people have some ideas on that mentioned in the thread - again, not an area I'm especially familiar with.
>  
> - What is the preferred human-readable text file format to support in LLVM? I understand that there's already code to support parsing YAML, so this might be an obvious choice. OTOH JSON is really popular and there are a lot of parsers in other languages that can already deal with this file format. I'm happy to support both but was wondering whether there was a preference for YAML aside for the reason I already cite?
> 
> I really don't have much/any context here to make a judgement - I've vaguely seen the existing YAML usage & know there was/is some in LLD, maybe some being used over in the codeview debug info support (for generating codeview debug info).
>  
> - This proposal only talks of the tool itself, but the implementation of the tool involves some moving parts that are worth implementing as libraries and tested in isolation (or in combinations, some mocked and faked, etc). I'm a fan of writing unit tests for these things but I don't see a unittests/tools directory for these tool-specific internals testing. Is this something worth having? Any pointers on how to proceed with this unit-testing of tool-specific internals?
> 
> Generally we make the tools small and put any generically usable code in libraries in LLVM (see libDebugInfo which was used for quite a while (& parts of it still are) exclusively for llvm-dwarfdump (some parts are now used in llvm-dwp and llvm-dsymutil)).
> 
> So if there's some reasonable library code you could put it in LLVM's lib directory in an appropriate spot. Or you can add unit tests for a tool - don't think there's any philosophical reason that'd be avoided.
> 

That's a thought. We could make the log reading code something that's a library that anybody should be able to use. If it was going to be a new library under LLVM, do I just create a directory+CMake directives in include/llvm/ and move as much of the logic there? Or put it say in include/llvm/Support/ ?

-- Dean