[cfe-dev] Some DTrace probes for measuring per-file time.

Sean Silva chisophugis at gmail.com
Mon Apr 6 17:39:01 PDT 2015


At the social last week, some of the measurements I did with DTrace for
clang's modules came up in conversation, and Alexei asked me to post the
scripts I used to the list. IIRC I did share some of these on IRC, but
nothing made it to the list.

Attached you'll fine a patch for clang which adds some USDT probes, along
with a DTrace script for measuring total time spent in each header file
using those probes (there are also "phony" files which track a handful of
other things). This script measures time on CPU (DTrace's vtimestamp), so
time blocked on IO (which is hidden by the parallelism of the build) does
not come into play.

This allows very fine-grained measurement of the aggregate time spent in
each header. I originally used this to identify which parts of codebases
will give the most build speed up by being modularized.

Small changes to the script can also produce other interesting information,
such as:
- recursive inclusions (e.g. on Mac I found this recursive inclusion in the
system headers: http://pastie.org/pastes/10077318/text ).
  - The underlying cause for this one, which is also present in clang's
builtin headers, is that when the recursive inclusion is hit, we haven't
finished with the file in the first place, so we don't know if the entire
file is wrapped in a proper include guard so we can't skip re-entering the
file; therefore we textually re-enter the entire file.
- more generally, discovering places where the multiple-include
optimization is not firing (suffice it to say that after doing this I now
exclusively use #pragma once everywhere I have the choice, the ifdef guard
is way too fragile).
- the number of calls to malloc while parsing each file
- the number of calls to any symbol inside clang while parsing each file
- etc.


One caveat is that on the Mac implementation of DTrace (where I developed
this script), having DTrace enabled causes causes every process startup to
grab a global lock in the kernel, which limits the parallelism of the
build. This might not sound too bad, but it has severe implications when
measuring CPU time. Essentially: if clang gets faster ==> clang finishes
compiling each file faster ==> faster rate of clang processes starting up
==> more contention on the lock ==> fewer processes actually running in
parallel ==> if you have hyperthreading, then suddenly some threads will
have a whole core to themselves instead of sharing ==> those threads
experience faster single-threaded performance ==> the reported CPU time
will decrease. This decrease in CPU time is large enough to cause
significant measurement discrepancies.

Don't believe me? Try building LLVM itself on a machine with hyperthreading
and doing `time ninja -j$(# of hyperthreads on your machine)` vs `time
ninja -j$(# of physical cores on your machine)`. You will see that the
reported "user time" decreases *drastically* when using fewer jobs, even
though the real time increases.

The only way I know to avoid this issue is to just use fewer jobs (ideally
just 1, but any number less than the number of physical cores should be
alright).


If someone is interested in removing that lock (or complaining to the right
kernel people at Apple), look for DTRACEHIOC_ADDDOF in
http://opensource.apple.com/source/xnu/xnu-2782.1.97/bsd/dev/dtrace/dtrace.c
. I believe the right solution is to cache the USDT probes by inode or vma,
so that the lock is only entered when the cache is cold for the
executable/dylib being loaded.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150406/82fbc84e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dtraceprobes.d
Type: application/octet-stream
Size: 1802 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150406/82fbc84e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dtraceprobes.patch
Type: application/octet-stream
Size: 6733 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150406/82fbc84e/attachment-0001.obj>


More information about the cfe-dev mailing list