[LLVMdev] RFC: Binary format for instrumentation based profiling data
Justin Bogner
mail at justinbogner.com
Tue Apr 1 16:29:48 PDT 2014
Chandler, are you okay with this way forward?
Justin Bogner <mail at justinbogner.com> writes:
> Chandler Carruth <chandlerc at google.com> writes:
>> Format 2
>> --------
>>
>> This format should be efficient to read and preferably reasonably
>> compact. We'll convert from format 1 to format 2 using llvm-profdata,
>> and clang will use format 2 for PGO.
>>
>> Since the only particularly important operation in this use case is fast
>> lookup, I propose using the on disk hash table that's currently used in
>> clang for AST serialization/PTH/etc with a small amount of metadata in a
>> header.
>>
>> The hash table implementation currently lives in include/clang/Basic and
>> consists of a single header. Moving it to llvm and updating the clients
>> in clang should be easy. I'll send a brief RFC separately to see if
>> anyone's opposed to moving it.
>>
>> I can mention this and we can discuss this on the other thread if you would
>> rather, but I'm not a huge fan of this code. My vague memory was that this was
>> a quick hack by Doug that he never really expected to live long-term.
>
> It may not be the prettiest piece of code, but given that it's used in
> several places in clang and hasn't needed any significant changes since
> 2010, I'd say it's fairly solid. It also has the very obvious advantage
> of already existing, which makes it a pretty good candidate for a
> version 1 format, IMHO.
>
>> I have a general preference for from-disk lookups to use tries (for strings,
>> prefix tries) or other fast, sorted lookup structures. They have the nice
>> property of being inherently stable and unambiguous, and not baking any
>> hashing algorithm into it.
>
> I would like to experiment with a few trie-based approaches for this as
> we try to optimize the PGO process further (both for space and for
> lookup time). Even so, it's not a sure thing that this will work better,
> and I don't think it's worth delaying getting something that people can
> use out the door.
>
> If you're opposed to moving the existing OnDiskHashTable into Support,
> perhaps because you don't think it should proliferate to other uses,
> the obvious alternative is to include a private copy of a stripped down
> version of it for the profile reader and writer to use themselves. I'm
> not sure if this is worth the copy pasted code, but it is an
> option. What do you think?
More information about the llvm-dev
mailing list