[PATCH] D21987: [XRay] Implement `llvm-xray extract`, start of the llvm-xray tool

Mon Sep 12 17:48:18 PDT 2016

On Mon, Sep 12, 2016 at 5:43 PM Dean Michael Berris <dberris at google.com>
wrote:

> On Tue, Sep 13, 2016 at 2:43 AM David Blaikie <dblaikie at gmail.com> wrote:
>
> On Sun, Sep 11, 2016 at 11:24 PM Dean Michael Berris <dberris at google.com>
> wrote:
>
> dberris added a comment.
>
> In https://reviews.llvm.org/D21987#538323, @dblaikie wrote:
>
> > > > Still some outstanding design issues to discuss re: file formats,
> conversions, portability, etc. But once that's done.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > I'm still concerned about reading/writing the binary format using
> memory mapping/splatting - I think the tools should be portable (in the
> same way that LLVM/Clang are - we don't build cross compilers, we just
> build tools that can read/write from/to any format on any platform), but
> even that's an implementation detail to a degree (unless making the tools
> that generic removes the need for the separate portable format/conversion
> tools).
> >
> > >
> >
> > >
> >
> > > That's a tricky one. The problem with _not_ doing this memory-dump
> format is the cost of doing it in the XRay runtime.
> >
> >
> > Ah, sorry - I should clarify/be more specific.
> >
> > I don't have so much of a problem with the writing side - I appreciate
> there are performance concerns there. (but wouldn't mind understanding them
> better - how important is the performance once xray features are turned on?
> (obviously having xray be as close to zero cost when not in use is
> important) What sort of cost do you see if it did write a consistent record
> format? (copying integers into known-width types, changing to a fixed
> endianness))
>
>
> That's a good question.
>
> When it's off, we'd like the cost to be as close to 0 as possible. When
> it's recording, we'd like the recording cost to be as little as possible so
> as not to interfere too much with the latency profiles of the functions
> being instrumented. So this means, if we can get away with introducing a
> fixed overhead when on (say 0.05 microseconds or 50 nanoseconds on every
> record) then that's how we'd structure it. So far we've been measuring far
> less than that on the fixed costs of the trampolines, but we've very wary
> of the cost of the actual recording as well. So while we can tolerate 0.05
> microseconds maximum when we're recording, we'd get up there in case we
> need to.
>
> Now with future work for various modes and trade-offs on the runtime side,
> there's a lot of opportunities for reducing other costs -- like memory
> (doing selective recording based on duration), durable storage (how many
> bytes are on disk for the log/trace), as well as the cost of the actual
> recording operation. But in the meantime, we're exploring the bare minimum
> we can get, and the implementation we currently have does exactly this. :)
>
> > But what I'm thinking about is whether the tool could read the input no
> matter which platform it was running on - each architecture would have a
> different, but known, format (I mean, it already does - it's just defined
> more implicitly in terms of "what I can read from memory with this struct
> overlay"). So the reader can do all the "read this many bytes of integer,
> switch the endianness from whatever was written to whatever I need in this
> process, etc". That way the output file becomes portable, without changing
> the bits/incurring extra cost during the writing step.
>
> >
>
> > Having the tools only work on the architecture where the file was
> generated seems especially tricky/not-great - and I don't think we have
> anything like that in LLVM today. llvm-objdump works on COFF files for ARM
> even when it's run on Linux X86, etc (or should do, if it supports COFF,
> etc - I haven't looked closely... ).
>
>
> I agree. There are two changes that could facilitate this support:
>
> 1. Indicate in the log what system the log was generated from. This could
> very well indicate endiannes, sizes of certain fields, etc. We have enough
> padding in the file headers to support encoding most of this information.
> 2. Handle the data accordingly from the tools based on the header
> information.
>
> In the meantime, since we only just run on x86_64 and we have very
> specific fields, it seems premature to engineer something that will support
> all the platforms that come along later. And when we end up porting XRay to
> these other platforms, then we cross that bridge of extending the tool to
> support data generated from other systems.
>
> Are these concerns a blocker to getting something that works in LLVM
> first, then evolving it as we support more platforms?
>
>
> I think the concern of whether the writing tool can be run correctly
> across platform is something to resolve first. I expect test cases for the
> tool to involve checked in binaries and executed on all test configurations
> in LLVM. It's just a tool reading a file and producing another file or the
> stats, etc.
>
>
> With the changes lined up after this one, we don't need to check in
> binaries -- we can write the log files in YAML and have them converted to
> particular targets, when we support them.
>

OK, but the functionality would still be missing - in the sense that one
couldn't take a file from one machine to another, then run the tool on it.
I think that's, imho (others may fairly disagree - happy to chat more with
Eric & Chandler, for example), a blocker. The tool should be able to read
the file no matter which host we're on.

> Not only then can we read them in a platform-specific format we can write
> them in that format too. Might be a little more work, but the choice here
> is doing that versus using Flatbuffers (i.e. being more clever about it).
>

Not entirely following - but you're suggesting that portable reading of
these splatted files is not practical, or at least it would be enough work
that we might as well just do Flatbuffers from the get-go?

>
>
> Since we only write it on one platform at the moment, sure - we can assume
> the file is always laid out the same & when we figure out how to do it for
> another platform we can add bits, rev the version number (there is a
> version number in the file, right?), or whatever else is necessary.
>
>
>
> Yep -- we have version numbers and enough padding to make it happen. :)
>
>
> So if the suggestion is, move the Subcommand and flag definitions in
> headers, and define a single entry point function, and leave the dispatch
> to main, that might be doable -- but not sure how much cleaner that would
> be...
>
>
> Fair question - especially with global initialization ordering issues,
> it's possible that declaring a flag in a header, defining it in one file
> and defining others in different files, etc, may not be doable. But I'd
> probably look into/try it.
>
> I think it would be cleaner (the main file would just include a header
> that declares the subcommand and the single entry point, the separate file
> would define the subcommand, the subcommand flags, and the entry point - so
> one could easily look at just the slice that's relevant to the feature
> that's being worked on - seeing the patches for the future features the
> main function seemed to get rather long and unwieldy rather quickly, even
> with only a handful (~3) of subcommands)
>
>
> I can try -- in the meantime I've factored the command into a separate
> function.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160913/fd154ea7/attachment.html>