[llvm-dev] [XRay] Build instrumented Clang, some analysis results

Wed Jul 20 03:26:08 PDT 2016

> On 20 Jul 2016, at 20:02, C Bergström <cbergstrom at pathscale.com> wrote:
> 
> Some general questions about X-Ray
> -------------
> Is there a plan to make a separate mailing list or project around
> this? Do you have a list of planned features?

Interesting question -- so far we haven't decided yet whether XRay will live as another project. I'm certainly open to this possibility. No concrete plans yet. It's an open question in the original RFC too (http://lists.llvm.org/pipermail/llvm-dev/2016-April/098901.html).

There's a white paper that details what we plan to implement out in the open (http://research.google.com/pubs/pub45287.html). We're still working our way to getting to a full version as described in that white paper (basically blocked by my lack of familiarity with the LLVM codebase, and other n00b-y things :D).

There's not a concrete list of features, and we're certainly open to contributions from the community to add features that make sense. :)

> 
> Graphics tools for analysis? AMD open sourced their CodeAnalyst - What
> about some integration with that?
> 

Thanks for the pointer! Yes, I'd love to have more integration with existing visualisation tools that read a particular well-documented format. Others have mentioned Jumpshot which might be a little dated, but still something that some people use for similar things.

> Linux + Perf support (planned/exists)?
> 

There are no plans to support the perf counter-lookups yet. Although I certainly think that's a nice source of data to be logging XRay-style.

FWIW, the API for XRay allows us to decouple the things being logged at function entry/exit. Getting performance counters at those points is a nice idea, it should be doable.

> How much is this tied to something specific about Linux or it could be
> easily ported to another platform?

Currently, the only Linux-specific part I can remember is getting the cpu frequency (looking at sysfs files). That can be implemented on a platform-agnostic (or at least pluggable and portable) manner.

There are x86'isms and I'm working on understanding how to do this in Aarch64 or ARM.

> 
> What's the benefit of this vs a stable and production ready tool like Dtrace?
> 

I think I've pointed out the differences in a separate mail (some mail filters may have squashed that response, so apologies if that was missed): http://lists.llvm.org/pipermail/llvm-dev/2016-July/101922.html -- the short version is:

- Dtrace requires kernel-side support.
- XRay is completely in-process and controllable by the process through an API (not sure if dtrace is the same).
- XRay is selective and configurable by the application developer.
- XRay's cost is borne by the application only, and does not require stopping the application.

> How much overhead do you typically measure?
> 

We've seen in the "null logging case" something around O(100) cycles in X86 for the trampoline side of XRay. Of course richer logging requires more cycles, and is completely implementation-dependent. The current one under development only writes fixed-sized records, uses __rdtscp(), does aligned writes only, and flushes when the buffer is full. The buffer is 32k per thread.

I haven't formally done benchmarks on the current implementation yet, but I'd be happy to do that soon.

> If you're injection calls before/after every function - does it end up
> blocking optimizations? Without looking at the implementation, if
> you're injecting the calls late enough in the compilation process it
> won't be a "problem", but if it's too early - you're going to end up
> blocking a lot of optimizations and interfering with things a lot..
> 

It's currently implemented as a MachineFunctionPass, and as far as I can tell is already late enough in the process that we are not interfering with optimisations.

Cheers