[llvm-dev] [XRay] Build instrumented Clang, some analysis results

C Bergström via llvm-dev llvm-dev at lists.llvm.org
Wed Jul 20 03:02:09 PDT 2016

Some general questions about X-Ray
Is there a plan to make a separate mailing list or project around
this? Do you have a list of planned features?

Graphics tools for analysis? AMD open sourced their CodeAnalyst - What
about some integration with that?

Linux + Perf support (planned/exists)?

How much is this tied to something specific about Linux or it could be
easily ported to another platform?

What's the benefit of this vs a stable and production ready tool like Dtrace?

How much overhead do you typically measure?

If you're injection calls before/after every function - does it end up
blocking optimizations? Without looking at the implementation, if
you're injecting the calls late enough in the compilation process it
won't be a "problem", but if it's too early - you're going to end up
blocking a lot of optimizations and interfering with things a lot..

On Wed, Jul 20, 2016 at 3:48 PM, Dean Michael Berris via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi everyone,
> TL;DR: With current pending patches applied in compiler-rt and llvm, and trunk clang, you can build your application with XRay tracing enabled on Linux with tracing enabled before main starts, and logging stops when the main thread exits.
> Just a quick update, I have some patches under review that when applied cleanly to LLVM and compiler-rt allows for building applications with XRay tracing enabled on Linux binaries from start to end. The list of patches to apply are:
>   * https://reviews.llvm.org/D21987 (accounting tool, lives in llvm)
>   * https://reviews.llvm.org/D21612 then https://reviews.llvm.org/D21982 (compiler-rt XRay runtime implementation)
> To test this out, I built clang with XRay and ran it on a hello world application. Here's how I did it:
> 1) Build clang+compiler-rt with the above patches patched in. Follow the normal instructions for getting a version of clang (from trunk) built. I use CMake+Ninja to do it, and I build everything. Example session:
> # assume ./llvm is set up appropriately
> $ cd llvm-build
> $ cmake -DCMAKE_BUILD_TYPE=Release -G Ninja ../llvm
> $ ninja
> $ cd ..
> 2) Build clang again, preferably in another build directory where you can use the clang and compiler-rt built in #1, this time with the following additional flags for CMAKE_CXX_FLAGS: `-fxray-instrument` `-fxray-instruction-threshold=1`. The second flag tells clang+llvm to instrument functions that have at least one instruction in it. An example session of this would look something like:
> $ mkdir xray-llvm-build
> $ cd xray-llvm-build
> $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=../llvm-build/bin/clang++ -DCMAKE_C_COMPILER=../llvm-build/bin/clang -DCMAKE_CXX_FLAGS='-fxray-instrument -fxray-instruction-threshold=1 -g' -DBUILD_SHARED_LIBS=Off ../llvm
> $ ninja clang
> $ cd ..
> 3) Run clang on a sample program, configure XRAY_OPTIONS while doing so for verbosity control. In my case I have the following "Hello, World!" program compiled by clang:
> ---->8 test.cc 8<----
> #include <iostream>
> int main(int argc, char* argv[]) {
>   std::cout << "I'm being compiled with XRay!" << std::endl;
> }
> ---->8 test.cc 8<----
> $ XRAY_OPTIONS='verbosity=1' xray-llvm-build/bin/clang -o test.bin -O3 -x c++ -stdlib=libstdc++ test.cc
> You should then see two lines similar to the following printed:
> XRay: Log file in 'xray-log.DqqGQk'
> 4) Once you have one of those XRay log files, we can then use the 'xray-fc-account' tool built in the llvm-build/bin directory (in step 1). Warning: there's a lot of output in this step, I suggest piping to files. Example session:
> $ llvm-build/bin/xray-fc-account xray-log.DqqGQk -m xray-llvm-build/bin/clang -k >sample.txt 2>errs.txt
> We can even output it to CSV, which allows us to load the output in a spreadsheet application where we can start sorting and analysing the data as we see fit, by adding the "-format" flag to the above command:
> $ llvm-build/bin/xray-fc-account xray-log.DqqGQk -m xray-llvm-build/bin/clang -k -format=csv >sample.csv 2>errs.txt
> Analysis Results
> ====
> Given the above steps, I've done some quick analysis with my favourite spreadsheet application where I'm showing the top results for the Clang run above. Here's some interesting findings:
> - The top 20 functions called in clang (in the above invocation) are:
> 292174  clang::DeclContext::getPrimaryContext()
> 283681  llvm::FoldingSetNodeID::AddPointer(void const*)
> 214706  clang::Preprocessor::Lex(clang::Token&)
> 193352  llvm::FoldingSetNodeID::AddInteger(unsigned int)
> 161706  clang::Lexer::LexTokenInternal(clang::Token&, bool)
> 161055  clang::Lexer::Lex(clang::Token&)
> 143420  clang::DeclarationName::getNameKind() const
> 142071  llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul>::Allocate(unsigned long, unsigned long)
> 139009  clang::NamespaceDecl::getOriginalNamespace()
> 125706  clang::Redeclarable<clang::TagDecl>::DeclLink::getNext(clang::TagDecl const*) const
> 107928  clang::Type::getAsTagDecl() const
> 107785  clang::Type::getAsCXXRecordDecl() const
> 101150  llvm::FoldingSetNodeID::operator==(llvm::FoldingSetNodeID const&) const
> 96244   llvm::hashing::detail::hash_short(char const*, unsigned long, unsigned long)
> 94872   _ZN4llvm7hashing6detail23hash_combine_range_implIKjEENSt9enable_ifIXsr16is_hashable_dataIT_EE5valueENS_9hash_codeEE4typeEPS5_S9_
> 91536   clang::DeclContext::lookup(clang::DeclarationName) const
> 84122   clang::TypeLoc::getNextTypeLocImpl(clang::TypeLoc)
> 81368   clang::TypeLoc::getLocalSourceRangeImpl(clang::TypeLoc)
> 80626   clang::FunctionDecl::getCanonicalDecl()
> 79489   clang::TagDecl::getDefinition() const
> 77347   clang::Lexer::LexIdentifier(clang::Token&, char const*)
> - The top few slowest (filtered by hand, because, see caveats below):
> 1.  main
> 2.  clang::Parser::ParseNamespace(unsigned int, clang::SourceLocation&, clang::SourceLocation)
> 3.  clang::Parser::ParseInnerNamespace(std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> >&, std::vector<clang::IdentifierInfo*, std::allocator<clang::IdentifierInfo*> >&, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> >&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&)
> 4.  clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&)
> 5.  clang::Parser::ParseDeclaration(unsigned int, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&)
> 6.  clang::Parser::ParseDeclarationStartingWithTemplate(unsigned int, clang::SourceLocation&, clang::AccessSpecifier, clang::AttributeList*)
> 7.  clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&)
> 8.  clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, llvm::raw_pwrite_stream*)
> 9.  clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*)
> 10. clang::Parser::ParseTemplateDeclarationOrSpecialization(unsigned int, clang::SourceLocation&, clang::AccessSpecifier, clang::AttributeList*)
> 11. clang::Parser::ParseSingleDeclarationAfterTemplate(unsigned int, clang::Parser::ParsedTemplateInfo const&, clang::ParsingDeclRAIIObject&, clang::SourceLocation&, clang::AccessSpecifier, clang::AttributeList*)
> - For a single-threaded application, it ran on three "cpus":
> CPUID   min tsc max tsc duration
> 5       14,393,810,869,713,800.00       14,393,812,737,589,100.00       0.7181373
> 12      14,393,816,387,291,400.00       14,393,816,388,054,500.00       0.0002933702
> 13      14,393,810,074,468,400.00       14,393,810,869,659,100.00       0.305725
> - There's a single thread, and we get the full range of timestamps to get the overall duration:
> TID     min tsc max tsc duration
> 83637   14,393,810,074,468,400.00       14,393,816,388,054,500.00       2.427369
> Caveats
> ====
> * Because clang was built with -O3, we're missing some function exits because of tail call and sibling optimisations -- we try to handle those cases from the function call accounting tool instead of adding instrumentation points (yet) in LLVM. Further work needs to be done to identify those exit points as well so that we have less ambiguity in the data.
> * In that simple invocation of clang that took roughly around 1 second to complete, we are able to generate roughly 606MB from the naive in-memory logging implementation. That's a lot of data for the fully-instrumented clang version. Removing the '-fxray-instruction-threshold=1' flag defaults this to 200 instructions, and there's considerably less data logged and only from non-trivial functions (~34MB when I tested it).
> * The durations are in seconds, computed by taking the maximum CPU frequency. We assume that the CPU TSC counters don't drift/pause (i.e. assume "constant tic" and "nonstop tsc"). This may not be safe assumptions to make, but should be fine for the interim as we iterate on how to get this data more reliably at least in Linux.
> * If you add '-v' to the invocation of xray-fc-account, it will give you a stack reconstruction as it goes through every entry/exit event. Last I tried this it took a long time to generate the text, and I gave up when it went over the 1.7GB (of text) mark. It's pretty and hypnotic to watch the stack grow and shrink just from the entry and exits of functions, but it's not practical to do. :)
> ----
> If you're still reading up to here, thank you for taking the time. I'm happy to iterate on the questions people might have.
> Have fun, and I hope this helps!
> Cheers
> PS. If you'd like to get the raw trace, I'd be happy to share it but it's 81MB bzip2'd. :) Also, if you'd like to see the csv output or the text output, I'd be happy to share it too but not via email on the mailing list. :D
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list