[LLVMdev] RFC: Binary format for instrumentation based profiling data

Mon Mar 17 19:00:28 PDT 2014

> On Mar 17, 2014, at 17:22, Justin Bogner <mail at justinbogner.com> wrote:
> 
> Chandler Carruth <chandlerc at google.com> writes:
>> The other assumption here is that you want the same file format written by
>> instrumentation and read back by the compiler. While I think that is an
>> unsurprising goal, I think it creates quite a few limitations that I'd like to
>> point out. I think it would be worthwhile to consider the alternative of
>> having the profile library write out data files in a format which is
>> essentially "always" transformed by a post-processing tool before being used
>> during compilation.
>> 
>> Limitations of using the same format in both places:
>> - High burden on writing the file constrains the format (must be fast, must
>>  not use libraries, etc...)
>> - Have to write and index even though the writer doesn't really need it.
>> - Have to have the function name passed through the instrumentation,
>>  potentially duplicating it with debug info.
>> - Can't use an extensible file format (like bitcode) to insulate readers of
>>  profile data from format changes.
>> 
>> I'm imagining it might be nicer to have something along the lines of the
>> following counter proposal. Define two formats: the format written by
>> instrumentation, and the format read by the compiler. Split the use cases up.
>> Specialize the formats based on the use cases. It does require the user to
>> post-process the results, but it isn't clear that this is really a burden.
>> Historically it has been needed to merge gcov profiles from different TUs, and
>> it is still required to merge them from multiple runs.
> 
> This is an interesting idea. The counter data itself without index is
> dead simple, so this approach for the instrumentation written format
> would certainly be nice for compiler-rt, at the small cost of needing
> two readers. We'd also need two writers, but that appears inevitable
> since one needs to live in compiler-rt.

I'm in favour of two formats.  Simplifying compiler-rt is a worthwhile goal.

Nevertheless, the current proposal with a naive index is straightforward to produce, especially after the changes I committed today.  I think moving to that is a good incremental change. 

Moving forward we can split the format in two and evolve them independently.  In particular, compiler-rt's write could be coded as a few memcpy calls plus a header, if there's some freedom around the format. 

>> I think the results could be superior for both the writer and reader:
>> 
>> Instrumentation written format:
>> - No index, just header and counters
>> - (optional) Omit function names, and use PC at a known point of the function,
>>  and rely on debug info to map back to function names.
> 
> This depends a bit on whether or not the conversion tool should depend
> on the debug info being available. We'd need to weigh the usability cost
> against the size benefit.
> 
>> - Use a structure which can be mmap-ed directly by the instrumentation code
>>  (at least on LE systems) so that "writing the file on close" is just
>>  flushing the memory region to disk
> 
> If this is feasible, we could also make the format is host endian and
> force the post-processing to byteswap as it reads. This avoids online
> work in favour of offline.
> 
>> - Explicitly version format, and provide no stability going forward
>> 
>> Profile reading format:
>> - Use a bitcoded format much like Clang's ASTs do (or some other tagged format
>>  which allows extensions)
> 
> I'm not entirely convinced a bitcoded format is going to gain us much
> over a simpler on disk hash table. The variable bit rate integers might
> be worthwhile, but will it be efficient to look up the counters for a
> particular function name?
> 
> That said, the ASTs also make use of the on disk hash that Dmitri
> mentioned for various indexes, which is definitely worth looking at.
> 
>> - Leverage the existing partial reading which has been heavily optimized for
>>  modules, LLVM IR, etc.
>> - Use implicit-zero semantics for missing counters within a function where we
>>  have *some* instrumentation results, and remove all zero counters
>> - Maybe other compression techniques
>> 
>> Thoughts? Specific reasons to avoid this? I'm very much interested in
>> minimizing the space and runtime overhead of instrumentation, as well as
>> getting more advanced features in the format read by Clang itself.
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev