[PATCH] D21987: [XRay] Implement `llvm-xray extract`, start of the llvm-xray tool

Mon Sep 12 17:42:54 PDT 2016

On Tue, Sep 13, 2016 at 2:43 AM David Blaikie <dblaikie at gmail.com> wrote:

> On Sun, Sep 11, 2016 at 11:24 PM Dean Michael Berris <dberris at google.com>
> wrote:
>
> dberris added a comment.
>
> In https://reviews.llvm.org/D21987#538323, @dblaikie wrote:
>
> > > > Still some outstanding design issues to discuss re: file formats,
> conversions, portability, etc. But once that's done.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > I'm still concerned about reading/writing the binary format using
> memory mapping/splatting - I think the tools should be portable (in the
> same way that LLVM/Clang are - we don't build cross compilers, we just
> build tools that can read/write from/to any format on any platform), but
> even that's an implementation detail to a degree (unless making the tools
> that generic removes the need for the separate portable format/conversion
> tools).
> >
> > >
> >
> > >
> >
> > > That's a tricky one. The problem with _not_ doing this memory-dump
> format is the cost of doing it in the XRay runtime.
> >
> >
> > Ah, sorry - I should clarify/be more specific.
> >
> > I don't have so much of a problem with the writing side - I appreciate
> there are performance concerns there. (but wouldn't mind understanding them
> better - how important is the performance once xray features are turned on?
> (obviously having xray be as close to zero cost when not in use is
> important) What sort of cost do you see if it did write a consistent record
> format? (copying integers into known-width types, changing to a fixed
> endianness))
>
>
> That's a good question.
>
> When it's off, we'd like the cost to be as close to 0 as possible. When
> it's recording, we'd like the recording cost to be as little as possible so
> as not to interfere too much with the latency profiles of the functions
> being instrumented. So this means, if we can get away with introducing a
> fixed overhead when on (say 0.05 microseconds or 50 nanoseconds on every
> record) then that's how we'd structure it. So far we've been measuring far
> less than that on the fixed costs of the trampolines, but we've very wary
> of the cost of the actual recording as well. So while we can tolerate 0.05
> microseconds maximum when we're recording, we'd get up there in case we
> need to.
>
> Now with future work for various modes and trade-offs on the runtime side,
> there's a lot of opportunities for reducing other costs -- like memory
> (doing selective recording based on duration), durable storage (how many
> bytes are on disk for the log/trace), as well as the cost of the actual
> recording operation. But in the meantime, we're exploring the bare minimum
> we can get, and the implementation we currently have does exactly this. :)
>
> > But what I'm thinking about is whether the tool could read the input no
> matter which platform it was running on - each architecture would have a
> different, but known, format (I mean, it already does - it's just defined
> more implicitly in terms of "what I can read from memory with this struct
> overlay"). So the reader can do all the "read this many bytes of integer,
> switch the endianness from whatever was written to whatever I need in this
> process, etc". That way the output file becomes portable, without changing
> the bits/incurring extra cost during the writing step.
>
> >
>
> > Having the tools only work on the architecture where the file was
> generated seems especially tricky/not-great - and I don't think we have
> anything like that in LLVM today. llvm-objdump works on COFF files for ARM
> even when it's run on Linux X86, etc (or should do, if it supports COFF,
> etc - I haven't looked closely... ).
>
>
> I agree. There are two changes that could facilitate this support:
>
> 1. Indicate in the log what system the log was generated from. This could
> very well indicate endiannes, sizes of certain fields, etc. We have enough
> padding in the file headers to support encoding most of this information.
> 2. Handle the data accordingly from the tools based on the header
> information.
>
> In the meantime, since we only just run on x86_64 and we have very
> specific fields, it seems premature to engineer something that will support
> all the platforms that come along later. And when we end up porting XRay to
> these other platforms, then we cross that bridge of extending the tool to
> support data generated from other systems.
>
> Are these concerns a blocker to getting something that works in LLVM
> first, then evolving it as we support more platforms?
>
>
> I think the concern of whether the writing tool can be run correctly
> across platform is something to resolve first. I expect test cases for the
> tool to involve checked in binaries and executed on all test configurations
> in LLVM. It's just a tool reading a file and producing another file or the
> stats, etc.
>
>
With the changes lined up after this one, we don't need to check in
binaries -- we can write the log files in YAML and have them converted to
particular targets, when we support them. Not only then can we read them in
a platform-specific format we can write them in that format too. Might be a
little more work, but the choice here is doing that versus using
Flatbuffers (i.e. being more clever about it).

> Since we only write it on one platform at the moment, sure - we can assume
> the file is always laid out the same & when we figure out how to do it for
> another platform we can add bits, rev the version number (there is a
> version number in the file, right?), or whatever else is necessary.
>
>

Yep -- we have version numbers and enough padding to make it happen. :)

> So if the suggestion is, move the Subcommand and flag definitions in
> headers, and define a single entry point function, and leave the dispatch
> to main, that might be doable -- but not sure how much cleaner that would
> be...
>
>
> Fair question - especially with global initialization ordering issues,
> it's possible that declaring a flag in a header, defining it in one file
> and defining others in different files, etc, may not be doable. But I'd
> probably look into/try it.
>
> I think it would be cleaner (the main file would just include a header
> that declares the subcommand and the single entry point, the separate file
> would define the subcommand, the subcommand flags, and the entry point - so
> one could easily look at just the slice that's relevant to the feature
> that's being worked on - seeing the patches for the future features the
> main function seemed to get rather long and unwieldy rather quickly, even
> with only a handful (~3) of subcommands)
>

I can try -- in the meantime I've factored the command into a separate
function.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160913/ea417783/attachment.html>