[PATCH] InstrProf: Calculate a better function hash

Tue Mar 25 11:30:44 PDT 2014

On Tue, Mar 25, 2014 at 11:02 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

> On Mar 25, 2014, at 10:20 AM, Raul Silvera <rsilvera at google.com> wrote:
>
> > How about an FNV hash? That is very simple to implement, fast, and will
> be stronger at detecting changes.
>
> FNV looks great; thanks!  I’ll resubmit with FNV-1a [1].
>
> http://isthe.com/chongo/tech/comp/fnv/#FNV-1a

FNV is actually based on the same principles as Bernstein's -- it is
relying on multiplication to spread the bits throughout an integers state,
and xor (or addition as you originally wrote the patch, many variations on
Bernstein's use xor though).

These all will have reasonably frequent collisions in addition to be poorly
distributed over the space. You've indicated you don't care about the
distribution, but do care about collisions.

Also, you've asserted speed claims without data. Both Bernstein's hash (in
its original formulation, your code was actually a strange variant of it
that didn't operate on bytes or octets) and FNV are necessarily a
byte-at-a-time and thus *quite* slow for inputs of even several hundered
bytes.

We actually have a variation of CityHash that I implemented which is a bit
faster than CityHash (and for strings of bytes more than 128 bytes, several
times faster than Bernstein's) but has similarly strong collision
resistance.

But how much data are we talking about? And how frequently are you
computing this? MD5 is actually reasonably fast on modern hardware. The
reference benchmarks have shown roughly 500 cycles to compute the MD5 of an
8-byte message, and 800 or 900 cycles to compute the MD5 of a 64-byte
message. I would expect traversing the AST to build the inputs for this to
be significantly slower due to cache misses, but I think benchmarks would
help here.

> Should the hashing computation be split from PGO into its own utility?
> Having a general hashing for functions may have other uses; in particular
> MergeFunc comes to mind.
>

We have many, many implementations of hash functions in LLVM already. I am
strongly opposed to adding more without specific concrete use cases.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140325/8fee8a9a/attachment.html>