Lowering switch statements with hashing
Jasper Neumann
jn at sirrida.de
Fri Jan 17 09:09:43 PST 2014
Hello Joerg!
Here is a slightly better answer.
>> To test the hashing library hashlib there is hashtest.cpp.
> Having worked on the topic quite a bit for NetBSD, I am not
> sure your choice of algorithm is optimal. As I see it, there
> are two sane choices:
> (1) Just apply a randomised hash function to reduce the
> expected chain lengths to O(n/m), where m is the desired
> number of buckets. This is not perfect hashing, but avoids
> most of the search. It has the huge advantage of not needing
> side storage beside the jump table.
This is imperfect hashing which is trivial to implement and gives
acceptable performance.
Obviously however, at least a value check must follow. Since the buckets
can contain more than one element we need to do this in a (possibly
unrolled) loop (loop end condition!) or must first dispatch with a jump
table and then test the values. We can do better.
> (2) Use the CHM algorithm to produce an order preserving
> hash function. If the backend supports two relocations on
> the same address (one positive, one negative) OR the jump
> table can be expressed as relative expressions, this has the
> nice advantage of requiring only one additional memory
> access.
I had also considered CHM. This is a method which can easily deal with
really huge label sets since its runtime is O(n), and produces a quite
small auxiliary table.
However CHM produces a hash function that needs 2 table accesses whereas
Jenkins' needs only one.
CHM: h(x) = (table[h1(x)]+table[h2(x)]) mod size
Jenkins: h(x) = h1(x) ^ table[h2(x)]
This means that code generated with Jenkins' method ought to be a bit
faster.
I admit that Jenkins' method will probably take much longer to evaluate
the auxiliary table but since we have to deal with only moderately large
label sets (usually well below 1000 labels) this should be acceptable.
Our goal is not to speed up the compiler at all means but to let it
produce better code.
> As CHM is probalistic linear time, the only practical
> problem is the choice of hash functions. For integers, the
> upper half of a 32bit/64bit multiplication is a good
> universal hash function.
Yes, multiplication has good hashing performance in the upper bits which
is why Donald Knuth proposes it.
Also hashlib uses it for simple hashing which needs no auxiliary table.
I also wrote about the possibility to replace gen_ab_3 with such a
method in hashlib.txt.
> I would strongly advocate *against* using the Jenkin's
> construction here.
Please give hashtest.cpp a chance to show its performance, and also look
at the code my patch generates.
If really needed I can easily enhance hashlib to embrace CHM as well.
Best regards
Jasper
More information about the llvm-commits
mailing list