[cfe-dev] [RFC] Adding a different mode of "where clang spends time" reporting (timeline/flamegraph style)

Aras Pranckevicius via cfe-dev cfe-dev at lists.llvm.org
Sun Jan 20 01:52:41 PST 2019

*(I'm not familar with many/most of llvm/clang tooling, so some of my
questions below might not make sense... bear with me)*

>  - the distinction between tracing/profiling automatically (stack frames,
> using xray or a profiler) and manually annotating ranges is an important
> one - it may be possible to use XRay to combine the two.

Is Xray as "a general solution for this type of problem" general enough?
>From a quick glance, it seems to only work on Linux, which would exclude
most of clang users who are on other OSes.

>  - the timeline view, flame graph, and graphviz representations are
> independently useful, so being able to output something pprof can read is
> nice

Agreed. I only did Chrome Tracing one because it's trivial to write, and a
tool to view it exists on all OSes I care about. What other profiling
output formats are "common enough" to have support for? I guess this might
be highly industry specific (e.g. I haven't even heard of pprof before --
maybe because my work rarely if ever involves Linux).

 - for multi-threaded programs, being able to represent events split across
> or transferred between threads is important, but complicates the model

Yeah. I thought about making it support multiple threads, but then
seemingly all (or majority) of clang itself is single threaded. I found
some utilities in llvm codebase (thread locals, thread pool etc.) but they
seem to not be actually used in the whole codebase much.

For other implementations of a similar profiler  in other codebases that
I've done, I approached the threading situation by having all profiler data
be thread-local; that way profiler regions don't need to do any mutex
locks; the only additional cost is a TLS lookup. And then "at the end" when
profiling data is written into a file, which is required to happen when all
thread work is done already, it just merges all the per-thread data and
emits that. That is easy to implement and does not have high performance

>  - the chrome tracing format is pretty awkward to work with: bulky, odd
> semantics, doesn't survive truncation. It'd be nice to have something
> better.

Similar to question before -- do you have suggestions for which format has
readily available viewing tools (e.g. Speedoscope supports it, or similar
-- https://github.com/jlfwong/speedscope), and is better?

Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190120/2c304932/attachment.html>

More information about the cfe-dev mailing list