[PATCH] D21987: [XRay] Implement `llvm-xray extract`, start of the llvm-xray tool

Mon Sep 12 09:43:48 PDT 2016

On Sun, Sep 11, 2016 at 11:24 PM Dean Michael Berris <dberris at google.com>
wrote:

> dberris added a comment.
>
> In https://reviews.llvm.org/D21987#538323, @dblaikie wrote:
>
> > > > Still some outstanding design issues to discuss re: file formats,
> conversions, portability, etc. But once that's done.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > I'm still concerned about reading/writing the binary format using
> memory mapping/splatting - I think the tools should be portable (in the
> same way that LLVM/Clang are - we don't build cross compilers, we just
> build tools that can read/write from/to any format on any platform), but
> even that's an implementation detail to a degree (unless making the tools
> that generic removes the need for the separate portable format/conversion
> tools).
> >
> > >
> >
> > >
> >
> > > That's a tricky one. The problem with _not_ doing this memory-dump
> format is the cost of doing it in the XRay runtime.
> >
> >
> > Ah, sorry - I should clarify/be more specific.
> >
> > I don't have so much of a problem with the writing side - I appreciate
> there are performance concerns there. (but wouldn't mind understanding them
> better - how important is the performance once xray features are turned on?
> (obviously having xray be as close to zero cost when not in use is
> important) What sort of cost do you see if it did write a consistent record
> format? (copying integers into known-width types, changing to a fixed
> endianness))
>
>
> That's a good question.
>
> When it's off, we'd like the cost to be as close to 0 as possible. When
> it's recording, we'd like the recording cost to be as little as possible so
> as not to interfere too much with the latency profiles of the functions
> being instrumented. So this means, if we can get away with introducing a
> fixed overhead when on (say 0.05 microseconds or 50 nanoseconds on every
> record) then that's how we'd structure it. So far we've been measuring far
> less than that on the fixed costs of the trampolines, but we've very wary
> of the cost of the actual recording as well. So while we can tolerate 0.05
> microseconds maximum when we're recording, we'd get up there in case we
> need to.
>
> Now with future work for various modes and trade-offs on the runtime side,
> there's a lot of opportunities for reducing other costs -- like memory
> (doing selective recording based on duration), durable storage (how many
> bytes are on disk for the log/trace), as well as the cost of the actual
> recording operation. But in the meantime, we're exploring the bare minimum
> we can get, and the implementation we currently have does exactly this. :)
>
> > But what I'm thinking about is whether the tool could read the input no
> matter which platform it was running on - each architecture would have a
> different, but known, format (I mean, it already does - it's just defined
> more implicitly in terms of "what I can read from memory with this struct
> overlay"). So the reader can do all the "read this many bytes of integer,
> switch the endianness from whatever was written to whatever I need in this
> process, etc". That way the output file becomes portable, without changing
> the bits/incurring extra cost during the writing step.
>
> >
>
> > Having the tools only work on the architecture where the file was
> generated seems especially tricky/not-great - and I don't think we have
> anything like that in LLVM today. llvm-objdump works on COFF files for ARM
> even when it's run on Linux X86, etc (or should do, if it supports COFF,
> etc - I haven't looked closely... ).
>
>
> I agree. There are two changes that could facilitate this support:
>
> 1. Indicate in the log what system the log was generated from. This could
> very well indicate endiannes, sizes of certain fields, etc. We have enough
> padding in the file headers to support encoding most of this information.
> 2. Handle the data accordingly from the tools based on the header
> information.
>
> In the meantime, since we only just run on x86_64 and we have very
> specific fields, it seems premature to engineer something that will support
> all the platforms that come along later. And when we end up porting XRay to
> these other platforms, then we cross that bridge of extending the tool to
> support data generated from other systems.
>
> Are these concerns a blocker to getting something that works in LLVM
> first, then evolving it as we support more platforms?
>

I think the concern of whether the writing tool can be run correctly across
platform is something to resolve first. I expect test cases for the tool to
involve checked in binaries and executed on all test configurations in
LLVM. It's just a tool reading a file and producing another file or the
stats, etc.

Since we only write it on one platform at the moment, sure - we can assume
the file is always laid out the same & when we figure out how to do it for
another platform we can add bits, rev the version number (there is a
version number in the file, right?), or whatever else is necessary.

>
> > > We try to be as cheap as possible there (in compiler-rt) so that we
> don't spend too much time just formatting the log when the process is
> writing it down to disk. The tradeoff we've made here consciously is that
> any conversion should happen externally -- i.e. the tool should be able to
> read this format, and then turn it into something more manageable offline.
> That's a very conscious trade-off here because we can't afford to use too
> much time or too much memory on the compiler-rt implementation.
>
> >
>
> >
>
> >
>
> >
>
> > > > Higher level design/coding style stuff:
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Might be worth pulling out each of the subcommands into their own
> utility function/file along with the command line arguments. Though I can
> see an argument (har har) to be made for keeping all the arguments in the
> same place - but maybe subcommand arguments make sense to be grouped
> together along with the functionality, separate from the main file.
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > I thought about that, then it started to feel over-engineered -- since
> I'd have to figure out how to dispatch anyway from the main file, I still
> need to know which command to call based on which subcommand, this seemed
> like the more prudent thing to do.
>
> >
>
> >
>
> > Not sure I quite follow - I was thinking something fairly simple, just
> taking the contents of the "if (Extract)" block in main, and pulling it
> into a function (with whatever parameters it needs - probably not many,
> since they're all subcommand arguments anyway, mostly - which could be
> moved over to that other file (the implementation of extract) or left here
> and all passed in (but that might be a lot of argument passing)).
>
>
> I thought about this, except I'm not quite as familiar with the
> commandline flags library -- is there a way to declare the subcommand in
> the main, then have that defined elsewhere? I ask because of the way adding
> flags to the subcommands involves getting a reference to a Subcommand. Then
> in `main(...)` we still end up checking the Subcommand, and then calling a
> function.
>
> So if the suggestion is, move the Subcommand and flag definitions in
> headers, and define a single entry point function, and leave the dispatch
> to main, that might be doable -- but not sure how much cleaner that would
> be...
>

Fair question - especially with global initialization ordering issues, it's
possible that declaring a flag in a header, defining it in one file and
defining others in different files, etc, may not be doable. But I'd
probably look into/try it.

I think it would be cleaner (the main file would just include a header that
declares the subcommand and the single entry point, the separate file would
define the subcommand, the subcommand flags, and the entry point - so one
could easily look at just the slice that's relevant to the feature that's
being worked on - seeing the patches for the future features the main
function seemed to get rather long and unwieldy rather quickly, even with
only a handful (~3) of subcommands)

>
> In the meantime I'll do the function argument passing, and see how it goes
> for the other more complex examples later.
>
>
> https://reviews.llvm.org/D21987
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160912/bb4863ca/attachment.html>