Lowering switch statements with hashing

Thu Jan 16 14:07:23 PST 2014

Hello Anton, hello all!

 > Will you please provide RFC outlining the algorithm itself and
 > possible some benchmarks as the .txt definitely does not contain
 > enough details...

Well, we (a friend of mine and myself) set up a paper as mentioned in 
hash_llvm.txt which is downloadable at 
http://programming.sirrida.de/hashsuper.pdf and describes the simple* 
variants. These should produce code like this:

	imull	magic, %edi, %ecx
	shrl	$27, %ecx
	cmpl	ValTable(,%rcx,4), %edi
	jne	default
	jmpq	JumpTable(,%rcx,8)

The first two lines in this example are the hash function; a value 
comparison and an indirect jump follows.

The Jenkin's methods which will usually be used for bigger label sets 
(about 24 or more labels) produce 2 hash values a and b and the final 
hash function is evaluated as h(a,b) = a ^ BTable[b]. For very large 
BTable's an additional scramble table is applied to save some space; the 
threshold can be adapted.

 > I'm a bit concerned about perfect hashing since it usually (classic
 > implementations by Jenkins or MPH) involves two loads with the second
 > load from the location computed by a first load and thus this in many
 > cases yields two cache misses in a row.

You will find some artificial benchmarks in the mentioned paper.
For periodic patterns the decision tree might give better times than a 
jump table approach, however I don't know how to simulate this.
A decision tree also produces a lot of cache misses because of the large 
code involved.
At least a jump table uses a lot less branch prediction table entries 
which can be used otherwise.
Real world test can be done now since a real implementation is available 
with my patch. Also, I have provided some parameters which influence the 
code generation such as the selection of the used hash algorithms; if 
needed, I can easily provide others as well.

Best regards
Jasper