[LLVMdev] RFC: Using hashing for switch statements
Jasper Neumann
jn at sirrida.de
Fri Jan 24 11:51:56 PST 2014
Hi folks,
here is a short RFC regarding hashing for the implementation of switch
statements and my preliminary patch.
I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. You
can find a copy e.g. on
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html.
Best regards
Jasper
===
Preliminary: Special identifiers
n number of given switch labels
N size of the (intermediate) hash range, usually n rounded up to the
next power of 2
x the value to be switch upon
Why (perfect) hashing?
Hashing enables a jump table like approach even for sparse label sets.
Let's concentrate on perfect hashing in order to keep the code smaller
and faster.
Hashing is faster than a decision tree provided there are enough labels,
i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8
labels assuming random values of x.
The memory consumption of the hashing tables is comparable to the memory
footage of the code of a decision tree.
A decision tree consumes a lot of branch prediction resources.
For minimal perfect hashing the jump table does not need to contain
dummy labels and therefore has n instead of (almost) N elements.
"Switch-to-lookup tables": If all jump targets are sufficiently similar
hashing opens the possibility to use table which contain the
differences. Thereby the indirect jump is eliminated and we get a huge
performance gain.
Why not hashing?
A decision tree or even an else-if chain is faster than hashing for very
few labels.
The branch prediction logic usually does not work as satisfactorily as
for decision trees if fed with cyclical access patterns; i.e. the
break-even point in this scenario is higher.
Hashing can not deal with large label ranges since very single label is
treated. A possible compromise/solution might be to first catch all
single labels and sufficiently short ranges by hashing and then treat
the remaining ranges in a decision tree.
Why reversible hashing?
Reversible hashing does not need a value table and a simple range check
of the calculated hash value is sufficient.
Reversible hashing is very simple to implement.
The usual jump table approach is a special case of reversible hashing.
Why simple hashing?
A comparison against a value table is sufficient.
Simple hashing does not need an extra table.
Why not simple hashing?
Simple hashing needs a lot of hash function candidates to be tested
until a suitable is found; this costs time in the compiler.
For large label sets the chance to find such a function is very small;
the applicability is therefore reduced to quite small sets.
Minimal perfect hashing is usually not achievable.
Why Jenkins' hashing?
Jenkins' hashing produces very fast and compact code and needs one extra
table (BTable) with usually less elements than n; the elements itself
are unsigned integers which can become as high N-1.
The code first generates two hash values h1=f1(x) and h2=f2(x) where f1
and f2 limit their range by shifting or masking, then the final hash as
h = h1 ^ BTable[h1].
Minimal perfect hashing is achievable, albeit often at the cost of
larger tables.
For sufficiently high N the table memory can optionally be reduced by
introducing an extra scrambling table (STable); BTable then consists of
bytes whereas STable contains 256 words or DWords. Obviously this
reduction of memory costs another indirection in the resulting code.
As far as I tested Jenkins' hashing can be used for label sets of
several thousand elements; often even one million labels can be treated.
The application in a compiler is to lower switch statements. Switch
statements with much more than 1000 labels are almost never present in
practice.
Jenkins' hashing usually is sufficiently fast.
By the way: You will find the following remark in lib/Target/README.txt:
==>
Investigate lowering of sparse switch statements into perfect hash tables:
http://burtleburtle.net/bob/hash/perfect.html
<==
Why not Jenkins' hashing?
The search time of Jenkins' hashing might become unacceptably long for
extraordinary large label sets.
In practice this should never happen, however in this case the label set
will be split in two parts are the process of lowering switches will be
repeated.
An alternative (for very large label sets) could be a different perfect
hash algorithm such as CHM.
Why CHM?
CHM is extraordinarily fast at generating the extra table with a runtime
of O(1).
CHM can be used to hash almost arbitrarily many labels.
CHM produces an ordered minimal perfect hashing function.
Why not CHM?
CHM needs extra table space of 2.09 * n elements; this is larger
compared to Jenkins' method.
CHM (in the implementation of cmph-2.0) is much more complicated than
the code of Jenkins' hashing:
h1 = f1(x) % N;
h2 = f2(x) % N;
if (h1 == h2 && ++h2 >= N) h2 = 0;
h = (Table[h1] + Table[h2]) % n;
By enlarging N to a power of 2 and other tweaking we might replace the 3
modulo operations by shifting or masking at the cost of more table
space. We might also eliminate the conditional statement if in this case
we choose different hash functions and retry the calculation.
There is at least one more table access compared to Jenkins' method.
===
More information about the llvm-dev
mailing list