[LLVMdev] RFC: Binary format for instrumentation based profiling data

Tue Apr 15 23:32:41 PDT 2014

Sorry, just back from holiday (your email managed to catch the start of 2
weeks, sorry about that) and still getting back through the email and
review backlog.

Before I answer definitively, I want to catch up on all of the changes that
have happened since this very high level discussion. I don't know what the
current state of the on disk hash table is, or what the implications of
this being a long-term file format are. I'll go investigate that, since it
seems all the code is committed at this point anyways....

On Tue, Apr 1, 2014 at 4:29 PM, Justin Bogner <mail at justinbogner.com> wrote:

> Chandler, are you okay with this way forward?
>
> Justin Bogner <mail at justinbogner.com> writes:
> > Chandler Carruth <chandlerc at google.com> writes:
> >>     Format 2
> >>     --------
> >>
> >>     This format should be efficient to read and preferably reasonably
> >>     compact. We'll convert from format 1 to format 2 using
> llvm-profdata,
> >>     and clang will use format 2 for PGO.
> >>
> >>     Since the only particularly important operation in this use case is
> fast
> >>     lookup, I propose using the on disk hash table that's currently
> used in
> >>     clang for AST serialization/PTH/etc with a small amount of metadata
> in a
> >>     header.
> >>
> >>     The hash table implementation currently lives in
> include/clang/Basic and
> >>     consists of a single header. Moving it to llvm and updating the
> clients
> >>     in clang should be easy. I'll send a brief RFC separately to see if
> >>     anyone's opposed to moving it.
> >>
> >> I can mention this and we can discuss this on the other thread if you
> would
> >> rather, but I'm not a huge fan of this code. My vague memory was that
> this was
> >> a quick hack by Doug that he never really expected to live long-term.
> >
> > It may not be the prettiest piece of code, but given that it's used in
> > several places in clang and hasn't needed any significant changes since
> > 2010, I'd say it's fairly solid. It also has the very obvious advantage
> > of already existing, which makes it a pretty good candidate for a
> > version 1 format, IMHO.
> >
> >> I have a general preference for from-disk lookups to use tries (for
> strings,
> >> prefix tries) or other fast, sorted lookup structures. They have the
> nice
> >> property of being inherently stable and unambiguous, and not baking any
> >> hashing algorithm into it.
> >
> > I would like to experiment with a few trie-based approaches for this as
> > we try to optimize the PGO process further (both for space and for
> > lookup time). Even so, it's not a sure thing that this will work better,
> > and I don't think it's worth delaying getting something that people can
> > use out the door.
> >
> > If you're opposed to moving the existing OnDiskHashTable into Support,
> > perhaps because you don't think it should proliferate to other uses,
> > the obvious alternative is to include a private copy of a stripped down
> > version of it for the profile reader and writer to use themselves. I'm
> > not sure if this is worth the copy pasted code, but it is an
> > option. What do you think?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140415/168d1638/attachment.html>