[LLVMdev] RFC: Using hashing for switch statements

Fri Jan 24 11:51:56 PST 2014

Hi folks,

here is a short RFC regarding hashing for the implementation of switch 
statements and my preliminary patch.
I posted this patch on 2014-01-16 21:58 at llvm-commits at cs.uiuc.edu. You 
can find a copy e.g. on 
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html.

Best regards
Jasper

===

Preliminary: Special identifiers

n    number of given switch labels
N    size of the (intermediate) hash range, usually n rounded up to the 
next power of 2
x    the value to be switch upon

Why (perfect) hashing?

Hashing enables a jump table like approach even for sparse label sets.
Let's concentrate on perfect hashing in order to keep the code smaller 
and faster.
Hashing is faster than a decision tree provided there are enough labels, 
i.e. O(1) vs. O(log n). For x86 the break-even point is about 6..8 
labels assuming random values of x.
The memory consumption of the hashing tables is comparable to the memory 
footage of the code of a decision tree.
A decision tree consumes a lot of branch prediction resources.
For minimal perfect hashing the jump table does not need to contain 
dummy labels and therefore has n instead of (almost) N elements.
"Switch-to-lookup tables": If all jump targets are sufficiently similar 
hashing opens the possibility to use table which contain the 
differences. Thereby the indirect jump is eliminated and we get a huge 
performance gain.

Why not hashing?

A decision tree or even an else-if chain is faster than hashing for very 
few labels.
The branch prediction logic usually does not work as satisfactorily as 
for decision trees if fed with cyclical access patterns; i.e. the 
break-even point in this scenario is higher.
Hashing can not deal with large label ranges since very single label is 
treated. A possible compromise/solution might be to first catch all 
single labels and sufficiently short ranges by hashing and then treat 
the remaining ranges in a decision tree.

Why reversible hashing?

Reversible hashing does not need a value table and a simple range check 
of the calculated hash value is sufficient.
Reversible hashing is very simple to implement.
The usual jump table approach is a special case of reversible hashing.

Why simple hashing?

A comparison against a value table is sufficient.
Simple hashing does not need an extra table.

Why not simple hashing?

Simple hashing needs a lot of hash function candidates to be tested 
until a suitable is found; this costs time in the compiler.
For large label sets the chance to find such a function is very small; 
the applicability is therefore reduced to quite small sets.
Minimal perfect hashing is usually not achievable.

Why Jenkins' hashing?

Jenkins' hashing produces very fast and compact code and needs one extra 
table (BTable) with usually less elements than n; the elements itself 
are unsigned integers which can become as high N-1.
The code first generates two hash values h1=f1(x) and h2=f2(x) where f1 
and f2 limit their range by shifting or masking, then the final hash as 
h = h1 ^ BTable[h1].
Minimal perfect hashing is achievable, albeit often at the cost of 
larger tables.
For sufficiently high N the table memory can optionally be reduced by 
introducing an extra scrambling table (STable); BTable then consists of 
bytes whereas STable contains 256 words or DWords. Obviously this 
reduction of memory costs another indirection in the resulting code.
As far as I tested Jenkins' hashing can be used for label sets of 
several thousand elements; often even one million labels can be treated.
The application in a compiler is to lower switch statements. Switch 
statements with much more than 1000 labels are almost never present in 
practice.
Jenkins' hashing usually is sufficiently fast.
By the way: You will find the following remark in lib/Target/README.txt:
==>
Investigate lowering of sparse switch statements into perfect hash tables:
http://burtleburtle.net/bob/hash/perfect.html
<==

Why not Jenkins' hashing?

The search time of Jenkins' hashing might become unacceptably long for 
extraordinary large label sets.
In practice this should never happen, however in this case the label set 
will be split in two parts are the process of lowering switches will be 
repeated.
An alternative (for very large label sets) could be a different perfect 
hash algorithm such as CHM.

Why CHM?

CHM is extraordinarily fast at generating the extra table with a runtime 
of O(1).
CHM can be used to hash almost arbitrarily many labels.
CHM produces an ordered minimal perfect hashing function.

Why not CHM?

CHM needs extra table space of 2.09 * n elements; this is larger 
compared to Jenkins' method.
CHM (in the implementation of cmph-2.0) is much more complicated than 
the code of Jenkins' hashing:
h1 = f1(x) % N;
h2 = f2(x) % N;
if (h1 == h2 && ++h2 >= N)  h2 = 0;
h = (Table[h1] + Table[h2]) % n;
By enlarging N to a power of 2 and other tweaking we might replace the 3 
modulo operations by shifting or masking at the cost of more table 
space. We might also eliminate the conditional statement if in this case 
we choose different hash functions and retry the calculation.
There is at least one more table access compared to Jenkins' method.

===