[LLVMdev] RFC: Binary format for instrumentation based profiling data

Justin Bogner mail at justinbogner.com
Fri Apr 11 10:48:50 PDT 2014


Ping.

Justin Bogner <mail at justinbogner.com> writes:
> Chandler, are you okay with this way forward?
>
> Justin Bogner <mail at justinbogner.com> writes:
>> Chandler Carruth <chandlerc at google.com> writes:
>>>     Format 2
>>>     --------
>>>    
>>>     This format should be efficient to read and preferably reasonably
>>>     compact. We'll convert from format 1 to format 2 using llvm-profdata,
>>>     and clang will use format 2 for PGO.
>>>    
>>>     Since the only particularly important operation in this use case is fast
>>>     lookup, I propose using the on disk hash table that's currently used in
>>>     clang for AST serialization/PTH/etc with a small amount of metadata in a
>>>     header.
>>>    
>>>     The hash table implementation currently lives in include/clang/Basic and
>>>     consists of a single header. Moving it to llvm and updating the clients
>>>     in clang should be easy. I'll send a brief RFC separately to see if
>>>     anyone's opposed to moving it.
>>>
>>> I can mention this and we can discuss this on the other thread if you would
>>> rather, but I'm not a huge fan of this code. My vague memory was
>>> that this was
>>> a quick hack by Doug that he never really expected to live long-term.
>>
>> It may not be the prettiest piece of code, but given that it's used in
>> several places in clang and hasn't needed any significant changes since
>> 2010, I'd say it's fairly solid. It also has the very obvious advantage
>> of already existing, which makes it a pretty good candidate for a
>> version 1 format, IMHO.
>>
>>> I have a general preference for from-disk lookups to use tries (for strings,
>>> prefix tries) or other fast, sorted lookup structures. They have the nice
>>> property of being inherently stable and unambiguous, and not baking any
>>> hashing algorithm into it.
>>
>> I would like to experiment with a few trie-based approaches for this as
>> we try to optimize the PGO process further (both for space and for
>> lookup time). Even so, it's not a sure thing that this will work better,
>> and I don't think it's worth delaying getting something that people can
>> use out the door.
>>
>> If you're opposed to moving the existing OnDiskHashTable into Support,
>> perhaps because you don't think it should proliferate to other uses,
>> the obvious alternative is to include a private copy of a stripped down
>> version of it for the profile reader and writer to use themselves. I'm
>> not sure if this is worth the copy pasted code, but it is an
>> option. What do you think?



More information about the llvm-dev mailing list