<div dir="ltr">Sorry, just back from holiday (your email managed to catch the start of 2 weeks, sorry about that) and still getting back through the email and review backlog.<div><br></div><div>Before I answer definitively, I want to catch up on all of the changes that have happened since this very high level discussion. I don't know what the current state of the on disk hash table is, or what the implications of this being a long-term file format are. I'll go investigate that, since it seems all the code is committed at this point anyways....</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Apr 1, 2014 at 4:29 PM, Justin Bogner <span dir="ltr"><<a href="mailto:mail@justinbogner.com" target="_blank">mail@justinbogner.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Chandler, are you okay with this way forward?<br>
<div class="HOEnZb"><div class="h5"><br>
Justin Bogner <<a href="mailto:mail@justinbogner.com">mail@justinbogner.com</a>> writes:<br>
> Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> writes:<br>
>> Format 2<br>
>> --------<br>
>><br>
>> This format should be efficient to read and preferably reasonably<br>
>> compact. We'll convert from format 1 to format 2 using llvm-profdata,<br>
>> and clang will use format 2 for PGO.<br>
>><br>
>> Since the only particularly important operation in this use case is fast<br>
>> lookup, I propose using the on disk hash table that's currently used in<br>
>> clang for AST serialization/PTH/etc with a small amount of metadata in a<br>
>> header.<br>
>><br>
>> The hash table implementation currently lives in include/clang/Basic and<br>
>> consists of a single header. Moving it to llvm and updating the clients<br>
>> in clang should be easy. I'll send a brief RFC separately to see if<br>
>> anyone's opposed to moving it.<br>
>><br>
>> I can mention this and we can discuss this on the other thread if you would<br>
>> rather, but I'm not a huge fan of this code. My vague memory was that this was<br>
>> a quick hack by Doug that he never really expected to live long-term.<br>
><br>
> It may not be the prettiest piece of code, but given that it's used in<br>
> several places in clang and hasn't needed any significant changes since<br>
> 2010, I'd say it's fairly solid. It also has the very obvious advantage<br>
> of already existing, which makes it a pretty good candidate for a<br>
> version 1 format, IMHO.<br>
><br>
>> I have a general preference for from-disk lookups to use tries (for strings,<br>
>> prefix tries) or other fast, sorted lookup structures. They have the nice<br>
>> property of being inherently stable and unambiguous, and not baking any<br>
>> hashing algorithm into it.<br>
><br>
> I would like to experiment with a few trie-based approaches for this as<br>
> we try to optimize the PGO process further (both for space and for<br>
> lookup time). Even so, it's not a sure thing that this will work better,<br>
> and I don't think it's worth delaying getting something that people can<br>
> use out the door.<br>
><br>
> If you're opposed to moving the existing OnDiskHashTable into Support,<br>
> perhaps because you don't think it should proliferate to other uses,<br>
> the obvious alternative is to include a private copy of a stripped down<br>
> version of it for the profile reader and writer to use themselves. I'm<br>
> not sure if this is worth the copy pasted code, but it is an<br>
> option. What do you think?<br>
</div></div></blockquote></div><br></div>