[LLVMdev] RFC: Binary format for instrumentation based profiling data

Justin Bogner mail at justinbogner.com
Fri Mar 28 13:33:23 PDT 2014


Chandler Carruth <chandlerc at google.com> writes:
>     Format 2
>     --------
>    
>     This format should be efficient to read and preferably reasonably
>     compact. We'll convert from format 1 to format 2 using llvm-profdata,
>     and clang will use format 2 for PGO.
>    
>     Since the only particularly important operation in this use case is fast
>     lookup, I propose using the on disk hash table that's currently used in
>     clang for AST serialization/PTH/etc with a small amount of metadata in a
>     header.
>    
>     The hash table implementation currently lives in include/clang/Basic and
>     consists of a single header. Moving it to llvm and updating the clients
>     in clang should be easy. I'll send a brief RFC separately to see if
>     anyone's opposed to moving it.
>
> I can mention this and we can discuss this on the other thread if you would
> rather, but I'm not a huge fan of this code. My vague memory was that this was
> a quick hack by Doug that he never really expected to live long-term.

It may not be the prettiest piece of code, but given that it's used in
several places in clang and hasn't needed any significant changes since
2010, I'd say it's fairly solid. It also has the very obvious advantage
of already existing, which makes it a pretty good candidate for a
version 1 format, IMHO.

> I have a general preference for from-disk lookups to use tries (for strings,
> prefix tries) or other fast, sorted lookup structures. They have the nice
> property of being inherently stable and unambiguous, and not baking any
> hashing algorithm into it.

I would like to experiment with a few trie-based approaches for this as
we try to optimize the PGO process further (both for space and for
lookup time). Even so, it's not a sure thing that this will work better,
and I don't think it's worth delaying getting something that people can
use out the door.

If you're opposed to moving the existing OnDiskHashTable into Support,
perhaps because you don't think it should proliferate to other uses,
the obvious alternative is to include a private copy of a stripped down
version of it for the profile reader and writer to use themselves. I'm
not sure if this is worth the copy pasted code, but it is an
option. What do you think?




More information about the llvm-dev mailing list