[LLVMdev] RFC: Binary format for instrumentation based profiling data
Duncan P. N. Exon Smith
dexonsmith at apple.com
Mon Mar 24 15:26:27 PDT 2014
On Mar 24, 2014, at 12:29 PM, Chandler Carruth <chandlerc at google.com> wrote:
> Format 2
> --------
>
> This format should be efficient to read and preferably reasonably
> compact. We'll convert from format 1 to format 2 using llvm-profdata,
> and clang will use format 2 for PGO.
>
> Since the only particularly important operation in this use case is fast
> lookup, I propose using the on disk hash table that's currently used in
> clang for AST serialization/PTH/etc with a small amount of metadata in a
> header.
>
> The hash table implementation currently lives in include/clang/Basic and
> consists of a single header. Moving it to llvm and updating the clients
> in clang should be easy. I'll send a brief RFC separately to see if
> anyone's opposed to moving it.
>
> I can mention this and we can discuss this on the other thread if you would rather, but I'm not a huge fan of this code. My vague memory was that this was a quick hack by Doug that he never really expected to live long-term.
>
> I have a general preference for from-disk lookups to use tries (for strings, prefix tries) or other fast, sorted lookup structures.
These profiles will contain every function in a program. Relatively few of these will be needed per translation unit (per invocation of clang). I suspect that an on disk hash will perform better than a trie for this use case, since it requires fewer loads from disk.
But the main benefit of the clang on-disk hash is that it’s in use and it already works. Unless tries are significantly better, I prefer cleaning up the (working) hash table implementation to implementing (and debugging) something new.
> They have the nice property of being inherently stable and unambiguous, and not baking any hashing algorithm into it.
It *is* harder to keep the hash table stable. I think it’s worth the cost here.
More information about the llvm-dev
mailing list