[PATCH] D21987: [XRay] Implement `llvm-xray extract`, start of the llvm-xray tool

Sun Sep 11 23:24:23 PDT 2016

dberris added a comment.

In https://reviews.llvm.org/D21987#538323, @dblaikie wrote:

> > > Still some outstanding design issues to discuss re: file formats, conversions, portability, etc. But once that's done.
>
> > 
>
> > > 
>
> > 
>
> > > I'm still concerned about reading/writing the binary format using memory mapping/splatting - I think the tools should be portable (in the same way that LLVM/Clang are - we don't build cross compilers, we just build tools that can read/write from/to any format on any platform), but even that's an implementation detail to a degree (unless making the tools that generic removes the need for the separate portable format/conversion tools).
>
> > 
>
> > 
>
> > That's a tricky one. The problem with _not_ doing this memory-dump format is the cost of doing it in the XRay runtime.
>
>
> Ah, sorry - I should clarify/be more specific.
>
> I don't have so much of a problem with the writing side - I appreciate there are performance concerns there. (but wouldn't mind understanding them better - how important is the performance once xray features are turned on? (obviously having xray be as close to zero cost when not in use is important) What sort of cost do you see if it did write a consistent record format? (copying integers into known-width types, changing to a fixed endianness))

That's a good question.

When it's off, we'd like the cost to be as close to 0 as possible. When it's recording, we'd like the recording cost to be as little as possible so as not to interfere too much with the latency profiles of the functions being instrumented. So this means, if we can get away with introducing a fixed overhead when on (say 0.05 microseconds or 50 nanoseconds on every record) then that's how we'd structure it. So far we've been measuring far less than that on the fixed costs of the trampolines, but we've very wary of the cost of the actual recording as well. So while we can tolerate 0.05 microseconds maximum when we're recording, we'd get up there in case we need to.

Now with future work for various modes and trade-offs on the runtime side, there's a lot of opportunities for reducing other costs -- like memory (doing selective recording based on duration), durable storage (how many bytes are on disk for the log/trace), as well as the cost of the actual recording operation. But in the meantime, we're exploring the bare minimum we can get, and the implementation we currently have does exactly this. :)

> But what I'm thinking about is whether the tool could read the input no matter which platform it was running on - each architecture would have a different, but known, format (I mean, it already does - it's just defined more implicitly in terms of "what I can read from memory with this struct overlay"). So the reader can do all the "read this many bytes of integer, switch the endianness from whatever was written to whatever I need in this process, etc". That way the output file becomes portable, without changing the bits/incurring extra cost during the writing step.

> 

> Having the tools only work on the architecture where the file was generated seems especially tricky/not-great - and I don't think we have anything like that in LLVM today. llvm-objdump works on COFF files for ARM even when it's run on Linux X86, etc (or should do, if it supports COFF, etc - I haven't looked closely... ).

I agree. There are two changes that could facilitate this support:

1. Indicate in the log what system the log was generated from. This could very well indicate endiannes, sizes of certain fields, etc. We have enough padding in the file headers to support encoding most of this information.
2. Handle the data accordingly from the tools based on the header information.

In the meantime, since we only just run on x86_64 and we have very specific fields, it seems premature to engineer something that will support all the platforms that come along later. And when we end up porting XRay to these other platforms, then we cross that bridge of extending the tool to support data generated from other systems.

Are these concerns a blocker to getting something that works in LLVM first, then evolving it as we support more platforms?

> > We try to be as cheap as possible there (in compiler-rt) so that we don't spend too much time just formatting the log when the process is writing it down to disk. The tradeoff we've made here consciously is that any conversion should happen externally -- i.e. the tool should be able to read this format, and then turn it into something more manageable offline. That's a very conscious trade-off here because we can't afford to use too much time or too much memory on the compiler-rt implementation.

> 

> 

> 

> 

> > > Higher level design/coding style stuff:

> 

> > 

> 

> > > 

> 

> > 

> 

> > > Might be worth pulling out each of the subcommands into their own utility function/file along with the command line arguments. Though I can see an argument (har har) to be made for keeping all the arguments in the same place - but maybe subcommand arguments make sense to be grouped together along with the functionality, separate from the main file.

> 

> > 

> 

> > 

> 

> > I thought about that, then it started to feel over-engineered -- since I'd have to figure out how to dispatch anyway from the main file, I still need to know which command to call based on which subcommand, this seemed like the more prudent thing to do.

> 

> 

> Not sure I quite follow - I was thinking something fairly simple, just taking the contents of the "if (Extract)" block in main, and pulling it into a function (with whatever parameters it needs - probably not many, since they're all subcommand arguments anyway, mostly - which could be moved over to that other file (the implementation of extract) or left here and all passed in (but that might be a lot of argument passing)).

I thought about this, except I'm not quite as familiar with the commandline flags library -- is there a way to declare the subcommand in the main, then have that defined elsewhere? I ask because of the way adding flags to the subcommands involves getting a reference to a Subcommand. Then in `main(...)` we still end up checking the Subcommand, and then calling a function.

So if the suggestion is, move the Subcommand and flag definitions in headers, and define a single entry point function, and leave the dispatch to main, that might be doable -- but not sure how much cleaner that would be...

In the meantime I'll do the function argument passing, and see how it goes for the other more complex examples later.

https://reviews.llvm.org/D21987